Chapter 1 What Is Industrial and Organizational Psychology? 


MODULE 1.3 SUMMARY 


Individualism/collectivism, power distance, 
uncertainty avoidance, masculinity/femininity, 
and long-term versus short-term orientation 
are some of the key considerations in describ¬ 
ing and characterizing various cultures. 

It is important for 1-0 psychologists to recog¬ 
nize and study the multiplicity of cultural fac¬ 
tors that influence workplace behavior. 


Culture is a system of shared meanings and 
ways of viewing events and things. 

The global economy has made it important for 
all countries to foster economic connections 
with others. 


KEY TERMS 


culture 

expatriates 

“West versus the Rest” 
mentality 
collectivist culture 


individualist culture 
individualism/collectivism 
power distance 
uncertainty avoidance 
masculinity/femininity 


long-term versus short-term 
orientation 
horizontal culture 
vertical culture 




Several themes run through the chapters of this book. They will be more apparent in some 
chapters than others. The first theme is one of a unified science of industrial and organ¬ 
izational psychology. Unified means several things in this context. First, to truly under¬ 
stand work behavior we must be willing to consider and acknowledge the interplay of many 
different approaches. For example, when we consider the issue of safety in the workplace, 
we could consider the individual strategies for creating a safe workplace embodied in the 
personnel, organizational, or human engineering approaches. The personnel approach would 
suggest selecting people who are likely to act in safe ways and then training them in those 
ways. The organizational approach might suggest rewarding people for safe behavior and 
reducing stress in the workplace. The engineering approach might endorse modifying the 
environment, equipment, and work procedures to eliminate the hazards associated with 
common accidents, as well as creating and maintaining a climate of safety in individual 
work groups. The unified approach means not preferring one or another of these 
approaches, but realizing that all approaches are useful and can be skillfully applied, either 
Mgr; individually or in combination depending on the situation at hand. We will apply the same 
of unity to the many other topics you will encounter in the book. 

Hr “Unified” has another meaning in our treatment of 1-0 psychology. It means that research 
and theories from non-American researchers are just as valuable to understanding work 
behavior as the work by American researchers. We are all in this together: Our fellow Croatian, 
Japanese, Swedish, and New Zealand 1-0 psychologists are just as intent at understand- 
Mg the experience of work as we are, and they are just as skilled at forming theories and 
conducting research. For those reasons, we will freely discuss the work of our colleagues 
“t other countries and combine it with what has been learned in the United States to develop 
* broader and deeper understanding of work behavior. As you read in Module 1.3, all 
Workers and work have been globalized, whether or not they embraced the concept of glob- 
"^ation. As a result, in many instances the research of a single country will not be sufficient 
C ’Wderstand the behavior of workers in that country or any other. So we will present 
You with the best thoughts of those who study work behavior, regardless of the country 
HP which it is studied. 

b t^b e second theme that will be apparent in our treatment is a holistic theme. By this we 
1116311 fbat we cannot and should not try to understand any work behavior by considering 










Chapter 1 What Is Industrial and Organizational Psychology? 

variables in isolation. There is a natural temptation to look for quick and simple answers. 
In some senses, the scientific method yields to that temptation by having the goal of par¬ 
simony, that is, choosing simple explanations and theories over complex ones. But in the 
real world, unlike the laboratory, we cannot control multiple forces that act on an indi¬ 
vidual. Your behavior is not simply a result of your mental ability or of your personality. 
The behavior of your instructors is not just the result of their knowledge or attitudes, or 
of the culture in which they were raised. These behaviors are influenced by all of those 
things, and to consider only one variable as the explanatory variable is an endeavor 
doomed to failure. We will remind you frequently that you must look at the person as a 
whole entity, not as a single variable. Human behavior in real-world situations is like a 
stew. We may know every ingredient that went into that stew, yet the actual experience 
of tasting the stew is much more than the single elements that made it up, and is certainly 
not described by any one of those elements. 

A third theme that will run through the chapters is the vast cultural diversity of virtu¬ 
ally any workforce in any country. A key facet of cultural diversity is differing values. It is 
these differing values that present the greatest challenge for employee selection, motiva¬ 
tion, leadership, team work, and organizational identification. The organization that can 
“solve” the diversity puzzle (i.e., how to effectively integrate diverse ways of thinking and 
acting) is likely the organization that will enjoy high productivity, low turnover, and high 
satisfaction. 1-0 psychology can play a major role in helping to accomplish this integra¬ 
tion and we will address these issues throughout the text. 

Parts 


The book is divided into three parts. 


1.4 The Organization of This Book 45 


TABLE 1.7 Scientific Journals in 1-0 Psychology 

Journal of Applied Psychology 
Personnel Psychology 
Human Performance 
Administrative Science Quarterly 
Human Factors 

Academy of Management Journal 
Academy of Management Review 
Annual Review of Psychology 
The Industrial-Organizational Psychologist I TIP} 

Organizational Behavior and Human Decision Processes 
International Review of I/O Psychology 
International Review of Applied Psychology 
Journal of Occupational and Organizational Psychology 
Leadership Quarterly 
Training and Development Journal 
Applied Psychology-. An International Review 
International Journal of Selection and Assessment 
Worir and Stress 

Journal of Occupational Health Psychology 

Journal of Organizational Behavior 

Journal of Personality and Social Psychology 

Industrial and Organizational Psychology: Perspectives on Science and Practice 

Australian Journal of Management 

European Journal of Worir and Organizational Psychology 

Romanian Journal of Industrial Psychology 


• The first part presents descriptive information about 1-0 psychology, some historical 
background and principles, and the basic methods of data collection and analysis. 

• The second part deals with material that has often been labeled “industrial” (as opposed 
to “organizational”). This includes material on individual differences, assessment, 
training, performance and its evaluation, and job analysis. 

• The third part covers material that is usually referred to as “organizational” and 
includes topics such as emotion, motivation, stress, leadership, groups and teams, 
fairness and justice, and organizational theory. 

Resources 


As a student of 1-0 psychology, you will want to consult resources beyond those offered 
by your instructor, this text, and its supplements. The most important of these resources 
are knowledge bases. This knowledge can come in two forms: paper and electronic. The 
electronic resources are websites and search engines that will identify useful information 
for you. Because website addresses change frequently, we will use this book’s companion 
website to list the most useful websites for the material covered in each chapter. The paper 
resources are the various journals and books that provide information about the topics 
covered in the text. Table 1.7 presents a list of the most common scientific journals that 
carry articles relevant to the text material. 

In the references at the end of the book, you will see these journals cited frequently. If 
you want to do additional reading on a topic or are preparing a course paper or project, 


L 


you should go to these journals first. In addition to journals, SIOP publishes the most 
current thinking on various topics in two series: the “Frontier Series” for research and the 
“Practice Series” for practice. These volumes present the most recent work available from 
some of the world’s best 1-0 psychologists. Table 1.8 provides the titles and publication 
year of each of these volumes. These books represent another excellent information base 
for your further reading. 

One final published source is the Annual Review of Psychology series published by Annual 
Reviews, Inc. One volume is published each year, with separate chapters covering all of 
the major areas of psychology, including 1-0 psychology. This is an excellent resource for 
higher-level reading on topics of interest. In the years 2002-2008, the 1-0 related chap¬ 
ters in the Annual Review have included: Organizational Psychology or Organizational 
Behavior (2002); Human Factors (2003); Small Groups (2004); Work Motivation, Groups 
an d Teams, Leadership, Personnel Evaluation and Compensation (2005), Understanding 
Affirmative Action (2006), Organizational Groups and Teams (2007), and Cognition in 
Organizations (2008). In the past 10-year period, virtually every topic of importance to 
1-0 psychologists has been addressed by one or more Annual Review chapters. 











Chapter 1 What Is Industrial and Organizational Psychology? 


1.4 The Organization of This Book 47 


TABLE 1.8 Useful Scientific and Practical Texts in 1-0 Psychology 

ORGANIZATIONAL FRONTIERS SERIES 

Career Development in Organizations-. Hall (1986) 

Productivity in Organizations-. Campbell & Campbell (1988) 

Training and Development in Organizations-. Goldstein (1989) 

Organizational Climate and Culture-. Schneider (1990) 

Work. Families, and Organizations-. Zedeck (1992) 

Personnel Selection in Organizations-. Schmitt & Borman (1993) 

Team Effectiveness and Decision Making. Guzzo & Salas (1994) 

The Changing Nature of Work-. Howard (1995) 

Individual Differences and Behavior in Organizations-. Murphy (1996) 

New Perspectives on International Industrial/Organizational Psychology-. Eartey & Erez (1997) 
The Changing Nature of Performance. Ilgen & Pulakos (1999) 

Measuring and Analyzing Behavior in Organizations-. Drasgow & Schmitt (2002) 

Multilevel Theory. Research and Methods in Organizations-. Klein & Koztowski (2000) 

Work Careers-. Feldman (2002) 

Measuring and Analyzing Behavior in Organizations-. Drasgow & Schmitt (2002) 

Emotions in the Workplace-. Lord. Klimoski. & Kanfer (2002) 

Personality and Work-. Banick & Ryan (2003) 

Managing Knowledge for Sustained Competitive Advantage Jackson. Hitt & DeNisi (2003) 
Health and Safety in Organizations-. Hofmann & Tetrick (2003) 

You Can Help Build Better Organizations by Becoming an 1-0 Psychdogst-. SI0P (2004) 

The Dark Side of Organizational Behavior-. Griffin & Oleary Kelly (2004) 

Discrimination at Work-. Dipboye & CoUela (2004) 

Situational Judgment Tests-. Weekley & Ployhart (2006) 

The Psychology of Entrepreneurship. Baum. Frese. & Baron (2007) 

The Psychology of Conflict ami Conflict Management in Organizations-. De Dreu & Gelfand (2007) 
Perspectives on Organizational Fit-. Ostroff & Judge (2007) 

Work Motivation-. Kanfer. Chen. & Pritchard (2008) 

PROFESSIONAL PRACTICE SERIES 

Compensation in Organizations-. Rynes & Gerhart (2000) 

Creating. Implementing ami Managing Effective Training. Kraiger (2001) 

Diagnosis for Organizational Change. Howard (1994) 

Diversity in the Workplace Jackson (1992) 

Employees. Careers and Job Creation-. London (1995) 

Evolving Practices in Human Resource Management-. Kraut & Korman (1999) 

Individual Psychologic* Assessment-. Silzer & Jeanneret (1998) 

Managing Selection in Changng Organizations-. Kehoe (1999) 

Organizational Development-. Waclawski & Church (2001) 

Organizational Surveys-. Kraut (1996) 

Performance Appraisal-. Smither (1998) 

The 21st Century Executive-. Silzer (2001) 

The Nature of Organizational Leadership. Zaccaro & Klimoski (2001) 

Organization Development-. Wadawski & Church (2002) 

Implementing Organizational Interventions-. Hedge & Pulakos (2002) 

Creating. Implementing, ami Managing Effective Training mid Development-. Kraiger (2002) 
Resizing the Organization-. DeMeuse & Marks (2003) 

Improving Learning Transfer in Organizations: Hotton & Baldwin (2003) 

The Brave New World of eHR-. Guettal & Stone (2005) 

Empioymem Discrimination Litigation-. Landy (2005) 

Getting Action from Organizational Surveys-. Kraut (2006) 

Customer Service Delivery: FogU & Salas (2006) 

Alternative Validation Strateges-. McPhail (2007) 


MODULE 1.4 SUMMARY 


This book treats 1-0 psychology in a unified and 
holistic manner. 

The three parts of this book discuss the basics 
of I-O, industrial topics, organizational topics, 
and the work environment. 


You should use this book’s supplements, 1-0 
journals, websites, SIOP Frontiers Series and 
Practice Series, and the Annual Review of Psycho¬ 
logy to find additional information for your 
course work, papers, and projects. 


CASE STUDY 1.1 POLICE OFFICER. MILFORD, USA 


:ase study exercise 


Below is a realistic case for you to read that 
tchides topics from every chapter in the book. After 
each paragraph, you will find the number of the 
chapter that contains information relevant to the 
case. We don’t expect you to be able to “solve” 
the case or address the issues presented. Instead, we 
present the case as a vivid example of the complexity 


of worft behavior and environments. After you com¬ 
plete each chapter in class, you will find it useful to 
come back to this case, identify the paragraphs relev¬ 
ant to the chapter you have read, and determine how 
the chapter material applies to the case. What we 
want you to do now is simply read and appreciate 
the experience of work in the 21st century. 


Welcome to Milford. We’re a “rust belt” survivor: 
Unlike a lot of towns around here, we’ve actually 
grown in the past few decades. Our town, with a 
current population of 600,000, has a fairly good 
| economic base, since we have an auto assembly 
plant and a glass factory, as well as a state univer¬ 
sity campus and a regional airport. About 50,000 
of our residents are Hispanic, mostly descended 
from Mexicans and Central Americans who came 
to the area decades ago as migrant farm workers 
and moved into town when agriculture declined. 
We have the kinds of problems with crime and 
drugs that you’d expect in any city of this size, 
but on the whole it’s a pretty good place to five. 
(Chapter 1) 

I’m 48 now and a captain of patrol with the 
Milford Police Department. I started working for 
the department when I was 26 years old, a little 
later than most of the other officers. Before that 


I spent five years in the navy maintaining sonar 
units, then used my military education benefit to 
get a college degree in law enforcement. At age 33 
I was promoted from patrol officer to patrol ser¬ 
geant. Five years after that I became a lieutenant, 
and in four more years I made my present grade. 
Except for two years behind a desk during my stint 
as a lieutenant. I’ve spent my entire career on the 
street. (Chapter 3) 

I’ve seen lots of different “systems” in my 22 years 
on the force. Systems for hiring, systems for pro¬ 
motion, systems for discipline, systems for training. 
If you didn’t like a particular system, all you had 
to do was wait a few years, and it would change 
because someone in power wanted a change. But 
now I’m a “person in power” and I have a say in 
these “systems.” I remember reading once that the 
best performance evaluation system people ever 
saw was the one they had in their last job. Boy, 
CONTINUED 






Chapter 1 What Is Industrial and Organizational Psychology? 


is that on the money! You hear people’s criticisms 
and you try to think of a way to make the system 
better, but no matter what you propose, they come 
back and complain that the old way was better. 
Sometimes it seems my work life was easier when 
I just had to put up with systems, not help make them 
up. (Chapters 5, 6, and 11) 

I have four direct reports: patrol lieutenants, 
shift commanders. We work the evening watch, 
which is roughly from 3.00 p.m. to 11.00 p.m. I say 
“roughly” because some of the subordinates come 
in at 2.30 and others leave at 11.30 to cover shift 
changes. We tried a rotating shift schedule many 
years ago, but that was a disaster. Now we follow a 
fixed shift system in which officers work consistent 
hours weekdays and every third weekend. Shift 
assignment follows strict seniority rules—new hires 
and newly promoted officers work what we call 
“graveyard”—the night shift, from 11:00 p.m. to 7:00 
a.m. They have to wait until someone else quits, is 
fired, or retires before they can move up to evening 
and day shifts. This is good and bad. It’s good for 
the cops because building your seniority to move 
off the graveyard shift, and eventually making it to 
the day shift, is something to look forward to. But 
it’s bad from a law enforcement standpoint because 
there’s a natural tendency to have a lot more prob¬ 
lems during graveyard hours than at other times. I 
mean, when John Q. Public decides to act stupid, 
it’s usually between 11:00 p.m. and 7:00 a.m. And 
here we are with mostly green recruits right out of 
the police academy on that shift, being supervised 
by newly promoted officers. We don’t even have any 
top echelon officers working: lieutenants cover the 
captain’s duty on the night shift. If you ask me, 
the department would have far fewer problems 
with patrol officer performance if only you could 
put the most experienced officers and supervisors 
where they were needed the most, on night watch. 
(Chapters 10 and 12) 

There’s a new breed of officer that I’ve been see¬ 
ing in the past 5 to 10 years and I can’t say I like 
it. These young guys don’t really seem as com¬ 
mitted to being a cop as I was when I started. They 
treat it like a “job,” not like a profession. They 
use all their sick time, they’re always looking to join 
“better” departments where the pay scale is higher. 
They seem to “expect” that they’ll be respected by 
civilians and fellow officers. They don’t seem to 
understand that respect is earned, not bestowed. 


Something funny happens to them after the aca¬ 
demy. When they arrive for their first shift after 
graduation, they’re like kids starting first grade. 
Big eyes, lots of questions, asking for “feedback,” 
asking for responsibility. They think they can do it 
all. But in less than a year, it’s a different story. You 
have to stay on their case to get anything done. They 
take longer meal breaks, find more excuses for not 
being able to respond to a call, saunter in two min¬ 
utes before roll call. Just another job to them. 
(Chapters 7, 8, and 9) 

Maybe this is because of the way recruits are 
tested. When I started, only the best were hired. 
The person who got the highest score on the civil 
service test, and was the fastest and the strongest 
on the physical ability test, was the person who got 
hired. You filled out a questionnaire to see if you 
had emotional problems; this was reviewed by the 
department shrink. You took a medical and they 
ran a background check on you. But now it’s dif¬ 
ferent. Now, in addition to the civil service test, 
recruits are interviewed about things like “interests” 
and “values” and “ethics.” They also take a per¬ 
sonality test, whatever that is. And they fill out a 
form about what they like and don’t like in a job. 
I don’t understand why they changed the system. 
Bad guys are still bad guys and they still do bad 
things. What’s so complicated about that? You 
want a cop who is stand-up, not afraid to do what 
it takes. You want a cop who is honest. You want 
a cop who is in for the long haul and who under¬ 
stands the chain of command. Why, all of a 
sudden, does the cop have to have a “personality”? 
(Chapter 3) 

Another thing is the job is getting much more 
technical. The city council just approved funds for 
giving every cop a personal digital assistant/hand¬ 
held computer with WiFi capabilities. They thought 
the police officers would love coming into the 21st 
century. According to them, we can use the com¬ 
puters, GPS systems, listen to departmental podcasts, 
and use handheld PDAs to analyze various crime 
patterns in our sectors, run more sophisticated 
checks for wants and warrants, look at traffic pat¬ 
terns for selective enforcement of moving violations 
where they are most dangerous. They don’t realize 
that we are being suffocated with equipment It takes 
10 minutes now just to load the stuff in the patrol 
car and get it up and running—and use simultan¬ 
eously!! (Chapter 3) 


^ Since I’ve been on the force I’ve seen various 
trends and fads come and go. The latest buzzwords 
seem to be statistical control or “stat con” where 

I f crime is analyzed by an “operations researcher.” Not 
■ ; surprisingly, the new cops seem to take to stat con 
better than the veterans, and this is causing a cer¬ 
tain amount of friction. The veterans call the new 
cops who understand stat con “stat cops.” The new 
cops joke that the veterans are “knuckle draggers” 
- and “Neanderthals.” It doesn’t help that some of the 
£ new cops score better on the promotional exams and 
^ get promoted faster. That means that the younger 
I “bosses” don’t always have the respect of the older 
R| subordinates. And the beat cops who aren’t pro¬ 
moted just get older and more cynical. (Chapters 3, 
6, and 14) 

The force has changed in other ways as well. When 
I started, I could count the number of female 
police officers in the department on one hand and 
have a finger or two left over. Now, the department 
is almost 40 percent female. That makes it tougher 
for the supervisors, because even though the law 
says they have to treat females the same as males, 
in reality the women get treated with kid gloves 
f , because the bosses are afraid of being accused of 
bias. Given the Hispanic community in town, 
we’ve always had a small but steady percentage of 
fv Hispanic officers. In recent years more blacks have 
gotten on the force, and now black officers out- 

I number the Hispanics. This has led to rivalry and 
|competition, particularly when it comes to pro- 
; motion exams. Everybody counts to see how many 
white, black, and Hispanic officers are promoted. 
Since the Hispanics have more seniority, they expect 
to have more promotions, but the blacks figure since 
there are more of them there should be propor¬ 
tionately more black supervisors. To make matters 
worse, the female officers always seem to do better 
on the exams than the men. As a result, I actually 
report to a female assistant chief. And she’s been in 
the department for only 13 years! But she’s a great 
test taker. And everybody knows that we had to have 
a woman somewhere up there in the chain of com¬ 
mand, whether she could do the job or not. The 
problem is that you don’t get respect from being a 
good test taker, you get it for being a good cop. Most 
of the officers don’t pay much attention to her. Her 
sions are always second-guessed and checked out 
with the other two male assistant chiefs. (Chapter 11) 


1.4 The Organization of This Book 49 


The chief of police is a good guy. He has been in 
the department for just 16 years, but he’s a college 
grad and went nights to get a master’s in public 
administration. He’s sharp and is always trying out 
new stuff. Last year he hired an outside consulting 
firm to run the promotion exams. Some of the 
consultants are psychologists, which was a little 
strange since the department already has a shrink 
on retainer. For the last sergeant’s exam, they had 
a bunch of the current sergeants complete some 
“job analysis” forms. That seemed like a pretty big 
waste of time. Why not just get the most experi¬ 
enced officers together for an hour’s discussion 
and have the consultants take notes? This “job 
analysis” thing led to an unusual promotion exam¬ 
ination. They did use a knowledge test that made 
sure the candidates knew the criminal code and 
department policies, but they also had the candi¬ 
dates play the role of a sergeant in front of a panel 
of judges. The judges were from other police 
departments, which didn’t make sense. How could 
they know what would be the right way to behave 
in our department? But the good part was that 
everyone got a score for performance and this 
score was added to the written test score. It was 
objective. The time before that, the lieutenants and 
captains got together for a few hours and talked 
about all the candidates and just made up a list 
according to their experiences with the candidates, 
which everybody agreed was unfair. (Chapters 1, 2, 
and 4) 

Over the past several years, there has been a new 
kind of tension in the department. It became really 
obvious after 9/11, but it was building up before that 
and has not gone away. In a nutshell, officers are 
more interested in having a life outside the depart¬ 
ment instead of making the department their 
whole life. I’m talking about the veterans, not just 
the young new guys. Their mind is not on their work 
the way it ought to be. It got a lot worse after 
9/11 because we’ve been expected to do so much 
more but with the same number of officers. We’ve 
tripled our airport detachment and had to post 
officers at city court, the bus station, and the power 
plant. And we’ve been getting a lot more calls 
about suspicious people and activities, especially 
people from other countries. A lot of private citi¬ 
zens have gotten so distrustful of foreigners that they 
act as if they’d like nothing better than to have the 
CONTINUED 




Chapter 1 What Is Industrial and Organizational Psychology? 


police force arrest every illegal alien it could find. 
But when you talk to those same people at work, 
all of a sudden they’d just as soon we look the other 
way because they know their corporate profits 
depend on having a cheap labor pool of undocu¬ 
mented workers. You’d expect the immigration 
authorities to give us some guidelines about this, but 
when I think about them I get just as annoyed as I 
do thinking about the other federal law enforcement 
agencies we’re supposed to “interface” with now. We 
get overlapping or conflicting information from 
the different agencies, or else the information is so 
general or outdated that it doesn’t really help us do 
our jobs. All of this extra responsibility has meant 
lots of overtime. Overtime used to be a big reward; 
now it’s a punishment. And it looks like sick leave 
is off the charts. Some of the officers are snapping 
at each other and civilians. And a week does not go 
by when we don’t have at least one police cruiser 
in a wreck. The officers seem really distracted. 
(Chapter 10) 

Citizens are not doing so well either. The eco¬ 
nomy headed south and unemployment went way 
up. Ever notice that people act funny when they lose 
their jobs? Men are the worst. They feel worthless 
and angry. They do things that they would never 
do otherwise. A traffic stop turns into World War 
III. All of a sudden, we cops are the enemy. A com¬ 
plete about-face from the first few months after 9/11 
when we were the heroes. We still remember that. 
People couldn’t thank us enough for keeping them 
safe. It made us feel really good. It sure helped 
recruiting, too. All of a sudden we had a big surge 
in applicants to take the test. It was nice while it 
lasted. But when the conventions canceled, and the 
orders for cars dropped, and the state had to start 
cutting back on services, we were not the good 
guys anymore. I know life is tough. I know people 
take hits when the economy sours, but how can we 
help them? It’s not our fault. We just end up deal¬ 
ing with the consequences. (Chapters 9 and 10) 

One of the other captains just came back from a 
seminar on new techniques in law enforcement 
where they talked about “teams.” I don’t know 
what is so new about that. We’ve always had squads 
of officers assigned to a beat or sector. But he says 
that is not what they were talking about. They were 
talking about putting teams together based not 
on seniority but on abilities and interests. They 
were talking about “competencies,” whatever that 


means. I didn’t think the officers would go for this. 
But he showed us a half-dozen studies that were done 
by people from top law enforcement schools. 
When I looked at the results, I had to agree that this 
team approach might actually work. What they 
found, when they gave it a chance, was that the 
officers were more involved in their jobs, felt more 
in control, and took pride in showing that they were 
more effective on patrol. Response times went down, 
more problems were handled with summonses 
and bench warrants rather than arrests, and resist¬ 
ing arrest charges and claims of brutality went down. 
The studies showed that sick leave went down as well. 
Hard to understand, but the numbers don’t lie. 
Maybe these experts are right, “team” policing 
would help us get out of this slump. (Chapter 13) 

But the academics can have some pretty dumb 
ideas, too. One of the sergeants is taking a course 
at the community college on employee attitudes, so 
he asked some of us to fill out a form to determine 
our job satisfaction. It was pretty simple, just 
asking us to agree or disagree with some sentences. 
He took the forms away, analyzed them, and came 
back to tell us what we already knew. We like our 
work, we don’t like our bosses, there are too many 
“rules” we have to follow in the department, our 
pay is OK, and we can get promoted if we are 
lucky and study hard. Why did we need to fill out 
a questionnaire to come to those conclusions? 
(Chapter 9) 

The job satisfaction questionnaire did reveal one 
sore point that all of us can sense. We have a prob¬ 
lem in the department with leadership. There are 
plenty of “bosses” but not many of them are lead¬ 
ers. Some of them play tough and threaten their 
officers with three-day suspensions for making 
mistakes. If the officer dares to speak back, it 
might go up to five days off without pay. Some of 
the other bosses try to be your friend, but that won’t 
work either. What you want is someone to help you 
do your job, not somebody to have coffee with. I 
was lucky when I was a lieutenant. My captain was 
somebody I could look up to, and he showed me 
what it takes to be a good cop and a good leader at 
the same time. Not everybody gets that kind of train¬ 
ing. It seems like there should be a training program 
for bosses, just like there’s one for new officers— 
sort of an academy for supervisors. I mentioned that 
to one of the other captains, who said it would be 
money down the drain. You would take officers off 


1.4 The Organization of This Book 


51 


the street and have nothing to show for it. I know 
there’s an answer to his objection, but I can’t think 
what it would be. (Chapter 12) 

There is another problem that I see, but the 
questionnaire didn’t ask any questions about it. 
We don’t really trust anybody except other cops. The 
courts seem to bend over backward to give the 
perps a break. The lawyers are always picking away 
at details of the arrest. Are you sure you saw him 
on the comer before you heard the alarm? Did you 
ask for permission to search her purse? How could 
you see a license plate from 25 feet away when the 
sun was almost down? Even the DA, who is sup¬ 
posed to be on your side, is always telling you that 
you should have done something differently. The 
problem is that this makes the average cop just want 
to make up whatever details are necessary to get the 
perp off the street. On the one hand, I can’t dis¬ 
agree with my officers that the system seems to be 
working against us and what we are trying to do. 
On the other hand, I don’t think that this is the way 
a police department should think or behave. We 


should all be working together in the criminal jus¬ 
tice system, not trying to “win.” (Chapter 14) 

But this is why the police force feels defensive 
against the rest of the city government. And why 
we were not pleased last week when the mayor 
announced that all city employees would have 
their performance evaluated once a year, and that 
includes the police department. Something to do 
with “accountability in government.” He has asked 
each city department to develop its own system, but 
it has to be numerical. He also said that there have 
to be consequences for poor performance. The 
chief is thinking about preventing anyone from 
taking a promotion exam if his or her performance 
is “unacceptable,” but he hasn’t told us how he plans 
to determine what is acceptable. I think this is a 
disaster waiting to happen. (Chapter 5) 

I’m proud of being a cop and I feel a lot of 
satisfaction with what I’ve achieved in my career. 
But to tell you the truth, the way things are going 
around here, retirement is starting to look more and 
more attractive to me. (Chapter 9) 









2 


Methods and Statistics in 1-0 Psychology 


Module 2.1 Science 54 
What Is Science? 54 
The Role of Science in Society 55 
Why Do 1-0 Psychologists Engage in 
Research? 56 

Module 2.2 Research 58 
Research Design 58 
Methods of Data Collection 61 
Qualitative and Quantitative 
Research 61 

The Importance of Context in Interpreting 
Research 61 

Generalizability and Control in 
Research 62 
Beneralizability 62 
Case Study 2.1 63 
Control 64 

Ethical Behavior in 1-0 Psychology 65 

Module 2.3 Data Analysis 68 
Descriptive and Inferential Statistics 68 


Descriptive Statistics 68 
Inferential Statistics 70 
Statistical Significance 70 
The Concept of Statistical Power 71 

Correlation and Regression 71 
The Concept of Correlation 72 
The Correlation Coefficient 72 
Multiple Correlation 74 
Correlation and Causation 74 
Meta-Analysis 75 

Micro-. Macro-, and Meso-Research 77 
Module 2.4 Interpretation 79 
Reliability 79 

Test-Retest Reliability 80 
Equivalent Forms Reliability 80 
Internal Consistency 81 
Inter-Rater Reliability 81 
Validity 82 

Criterion-Related Validity 83 
Content-Related Validity 85 
Construct-Related Validity 86 
Validity and the Law: A Mixed 
Blessing 88 









MODULE 2.1 


Science 


What Is Science? 


For many of us, the term “science” evokes mental images of laboratories, test tubes, and 
computers. We may imagine people wearing white lab coats, walking around making notes 
on clipboards. Certainly laboratories are the homes for some scientific activity and some 
scientists do wear white lab coats, but the essence of science is not where it is done or 
how scientists are dressed. Science is defined by its goals and its procedures. 

All sciences share common goals: the understanding, prediction, and control of some 
phenomenon of interest. Physics addresses physical matter, chemistry addresses elements 
of matter, biology deals with living things, and psychology is concerned with behavior. 
The 1-0 psychologist is particularly interested in understanding, predicting, and influenc¬ 
ing behavior related to the workplace. All sciences also share certain common methods 
by which they study the object of interest, whether that object is a chemical on the 
periodic table of elements or a human being employed in a corporation. These common 
methods include: 

1. A logical approach to investigation, usually based on a theory, a hypothesis, or 
simply a basic curiosity about an object of interest. In 1-0 psychology, this might 
be a theory about what motivates workers, a hypothesis that freedom to choose work 
methods will lead workers to be more involved with their work, or curiosity about 
whether people who work from their homes are more satisfied with their jobs than 
people who work in offices. 

2. Science depends on data. These data can be gathered in a laboratory or in the real 
world (or, as it is sometimes referred to, the field). The data gathered are intended 
to be relevant to the theory, hypothesis, or curiosity that precipitated the invest¬ 
igation. For example, 1-0 psychologists gather data about job performance, 
abilities, job satisfaction, and attitudes toward safety. 

3. Science must be communicable, open, and public. Scientific research is published 
in journals, reports, and books. Methods of data collection are described, data are 
reported, analyses are displayed for examination, and conclusions are presented. 
As a result, other scientists or nonscientists can draw their own conclusions about 
the confidence they have in the findings of the research or even replicate the research 


2.1 Science 


55 


themselves. In 1-0 psychology, there is often debate—sometimes heated argument 
i —about theories and hypotheses. The debate goes on at conferences, in journals, 
and in books. Anyone can join the debate by simply reading the relevant reports 
, or publications and expressing opinions on them, or by conducting and publish¬ 
ing their own research. 

4. Science does not set out to prove theories or hypotheses. It sets out to disprove 
them. The goal of the scientist is to design a research project that will eliminate 
all plausible explanations for a phenomenon except one. The explanation that can¬ 
not be disproved or eliminated is the ultimate explanation of the phenomenon. For 
example, in lawsuits involving layoffs brought by older employees who have lost 
their jobs, the charge will be that the layoffs were caused by age discrimination on 
the part of the employer. A scientific approach to the question would consider that 
possibility, as well as the possibility that the layoffs were the result of: 

i • Differences in the past performance of the individuals who were laid off. 

• Differences in the skills possessed by the individuals. 

• Differences in projected work for the individuals. 

i • Differences in training, education, or credentials of the individuals. 

5. One other characteristic of science that is frequently mentioned (MacCoun, 1998; 
Merton, 1973) is that of disinterestedness—the expectation that scientists will be 
objective and not influenced by biases or prejudices. Although most researchers are, 
and should be, passionately interested in their research efforts, they are expected 
to be dispassionate about the results they expect that research to yield—or, at the 
very least, to make public any biases or prejudices they may harbor. 


Disinterestedness 
Characteristic of scientists 
who should be objective 
and uninfluenced by biases 
or prejudices when 
conducting research. 


i- It will become apparent as we move through the chapters of this book that 1-0 psy¬ 
chology is a science. 1-0 psychologists conduct research based on theories and hypo¬ 
theses. They gather data, publish those data, and design their research in a way that 
eliminates alternative explanations for the research results. 1-0 psychologists (and scien¬ 
tists in general) are not very different from nonscientists in their curiosity or the way they 
form theories, hypotheses, or speculations. What sets them apart as scientists is the 
method they use. 


The Role of Science in Society 


We are often unaware of the impact that science has on our everyday lives. The water we 
drink, the air we breathe, even the levels of noise we experience have been influenced by 
decades of scientific research. Consider the challenge faced by a pharmaceutical company 
that wants to make a new drug available to the public. The Food and Drug Administra¬ 
tion (FDA) requires the pharmaceutical company to conduct years of trials (experiments) 
in the laboratory and in the field. These trials must conform to the standards of accept¬ 
able science: They will be based on a theory, data will be gathered, compiled, and inter¬ 
preted, and all alternative explanations for the effects of the drug will be considered. In 
addition, the data will be available for inspection by the FDA. Before the drug can be released 
to the public, the FDA must agree that the data show that the drug actually makes a con¬ 
tribution to medicine, and that it has no dangerous side effects. 

As you will see in a later section of this chapter that deals with ethics, the burden of 
trustworthy science must be shouldered by a trustworthy scientist. An example is provided 
by a 2008 Congressional inquiry involving the pharmaceutical company Pfizer and its 
cholesterol drug Lipitor. A Lipitor advertising campaign was launched in 2006 featuring 
i Robert Jarvik, the physician who was famous for developing an artificial heart valve. In 







Chapter 2 Methods and Statistics in 1-0 Psychology 


2.1 Science 57 


Expert witness Witness in 
a lawsuit who is permitted 
to voice opinions about 


one ad, Jarvik was engaged in a vigorous rowing exercise on a lake immediately after endors¬ 
ing the drug. When the public learned that a stunt double actually did the rowing, the 
drug and Jarvik were the objects of immediate criticism. Moreover, it was revealed that 
although Jarvik held a medical degree, he had never completed the certification neces¬ 
sary to practice medicine. Thus he was not qualified to give medical advice, which he appeared 
to be doing in the ads. The inauthentic scientist brought the science into question. 

The importance of the scientific method for the impact of human resource and 1-0 prac¬ 
tices can also be seen in society, particularly in the courts. As we will see in several of the 
chapters that follow (most notably, Chapter 6), individuals often bring lawsuits against 
employers for particular practices, such as hiring, firing, pay increases, and harassment. 
In these lawsuits, 1-0 psychologists often testify as expert witnesses. An expert witness, 
unlike a fact witness, is permitted to voice opinions about practices. An 1-0 psychologist 
might be prepared to offer the opinion that an employer was justified in using a test, such 
as a test of mental ability, for hiring purposes. This opinion may be challenged by oppos¬ 
ing lawyers as “junk science” that lacks foundation in legitimate scientific research. You 
will recall that we described “junk science” in Chapter 1 as a fascinating topic (unsup¬ 
ported by shoddy research. The scientific method is one of the most commonly accepted 
methods for protecting individuals from the consequences of uninformed speculation. 


In most of your course texts, you will be exposed to “theory.” Think of theories as either 
helpful or not helpful, rather than “right” or “wrong.” Klein and Zedeck (2004) remind 
us that “Theories provide meaning. Theories specify which variables are important and 
for what reasons, describe and explain relationships that link the variables” (p. 931). They 
suggest that good theories: 




offer novel insights; 
are interesting; 
are focused; 

are relevant to important topics; 
provide explanations; 
are practical. 


As you read the material that will follow in the subsequent chapters and, more import¬ 
antly, if you dig further and read the original statements of the theories, keep these char¬ 
acteristics of good theory in mind in order to decide which ones are helpful and which 
ones are not. 


Why Do 1-0 Psychologists Engage in Research? 


An old truism admonishes that those who do not study history are condemned to repeat 
it. In Chapter 1, we cautioned that researchers studying emotional intelligence but ignor¬ 
ing 60 years of research on social intelligence might have been condemned in just that 
way. A less elegant representation of the same thought was the movie Groundhog Day, in 
which Bill Murray gets to repeat the events of a particular day over and over again, learn¬ 
ing from his mistakes only after a very long time. Without access to scientific research, 
the individuals who make human resource (HR) decisions in organizations would be in 
Murray’s position, unable to learn from mistakes (and successes) that are already docu¬ 
mented. Each HR director would reinvent the wheel, sometimes with good and sometimes 
with poor results. By conducting research, we are able to develop a model of a system 
—a theory—and predict the consequences of introducing that system or of modifying a 
system already in place. Remember that in Chapter 1 we described the importance of research 
in the scientist-practitioner model. Even though you may not actually engage in scientific 
research, you will certainly consume the results of that research. 

Consider the example of hiring. Imagine that an organization has always used a first- 
come, first-served model for hiring. When a job opening occurs, the organization adver¬ 
tises, reviews an application blank, does a short unstructured interview, and hires the first 
applicant who has the minimum credentials. Research in 1-0 psychology has demonstrated 
that this method does not give the employer the best chance of hiring successful employ¬ 
ees. An employer that conducts a structured job-related interview, and that also includes 
explicit assessments of general mental ability and personality, will tend to make better hir¬ 
ing decisions. We can predict this because of decades of published research that form the 
foundation for our theory of successful hiring. When an organization decides on a course 
of action, it is predicting (or anticipating) the outcome of that course of action. The bet¬ 
ter the research base that employers depend on for that prediction, the more confident 
they can be in the likely outcome. Both science and business strategy are based on the 
same principle: predictability. Business leaders prefer to avoid unpleasant surprises; 
theory and research help them to do so. 


MODULE 2.1 SUMMARY 


Like other scientists, 1-0 psychologists conduct 
research based on theories and hypotheses. They 
gather data, publish those data, and design their 
research to eliminate alternative explanations 
for the research results. 

The scientific method has important repercus¬ 
sions in society, particularly in the courts where 
1-0 psychologists often testify as expert witnesses. 


1-0 research is important to organizations 
because every course of action that an organ¬ 
ization decides on is, in effect, a prediction or 
anticipation of a given outcome. The better the 
research base that supports that prediction, 
the more confident the organization can be of 
the outcome. 


KEY TERMS 


science 
hypothesis 
disinterestedness 
expert witness 









MODULE 2.2 



Research 


Research Design 


In the introductory section, we have considered the scientific method and the role of research 
in 1-0 psychology. Now we will consider the operations that define research in greater 
detail. In carrying out research, a series of decisions need to be made before the research 
actually begins. These decisions include: 


• Will the research be conducted in a laboratory under controlled conditions, or in 
the field? 


Research design Provides 
(he overall structure or 
architecture for the 
research study: allows 
investigators to conduct 
scientific research on a 
phenomenon of interest 

Experimental design 
Participants are randomly 
assigned to different 
conditions. 

Quasi-experimentat design 

Participants are assigned to 
different conditions, but 
random assignment to 
conditions is not possible. 

Nonexperimental design 


• Who will the participants be? 

• If there are different conditions in the research (e.g., some participants exposed to 
a condition, and other participants not exposed to the condition), how will par¬ 
ticipants be assigned to the various conditions? 

• What will the variables of interest be? 

• How will measurements on these variables be collected? 

Collectively, the answers to these questions will determine the research design, the 
architecture for the research. 

Spector (2001) has reviewed research designs in 1-0 psychology and devised a system 
of classification for distinguishing among the typical designs. He breaks designs down into 
three basic types: experimental, quasi-experimental, and nonexperimental. Experimental 
designs, whether the experiment is conducted in a laboratory or in the field, involve the 
assignment of participants to conditions. As an example, some participants may receive a 
piece rate payment for their work, whereas others receive an hourly rate. These two dif¬ 
ferent rates of pay would be two separate conditions, and participants might be assigned 
randomly to one condition or the other. The random assignment of participants is one 
of the characteristics that distinguishes an experiment from a quasi-experiment or non¬ 
experiment. If participants are randomly assigned to conditions, then any differences 
that appear after the experimental treatment are more likely to conform to cause-effect 


Does not include any 
"treatment" or assignment 
to different conditions. 


relationships. 

It is not always possible to assign participants randomly to a condition. For example, 
an organization might institute a new pay plan at one plant location, but not at another. 


2.2 Research 


59 



(a) lb) 

One way to enhance validity is to make experimental conditions as simitar as possible to actual work situations, 
(a) Actual radar being used by an air traffic controller: (b) a simulated radar screen designed for an experiment 


Or the researcher would assess employee satisfaction with an existing pay plan, then the 
organization would change the pay plan, and the researcher would assess satisfaction again 
with the new plan. This would be called a quasi-experimental design. 

I: In the experimental and quasi-experimental designs described above, the pay plan was 
k a “treatment” or condition. Nonexperimental designs do not include any “treatment” or 
conditions. In a nonexperimental design, the researcher would simply gather information 
about the effects of a pay plan without introducing any condition or treatment. Researchers 
often use the term “independent variable” to describe the treatment or antecedent con- ^ 

dition and the term “dependent variable” to describe the subsequent behavior of the research re * archer obse ^®" * 
participant. Spector (2001) identifies two common nonexperimental designs as the obser- employee behavior and 
vational design and the survey design. In the observational design, the researcher watches systematically records what 

an employee’s behavior and makes a record about what is observed. An observer might, is observed, 

for example, study communication patterns and worker efficiency by recording the number survey n«ig n Research 
of times a worker communicates with a supervisor in a given time period. Alternatively, strategy in which 

in the survey design, the worker is asked to complete a questionnaire describing typical participants are asked to 

interaction frequency with his or her supervisor. Table 2.1 presents an outline of the more complete a questionnaire or 
common research designs in 1-0 psychology. smei ' 

TABLE 2.1 Common Research Designs in 1-0 Psychology 



DESCRIPTION 

Random assignment of participants to conditions 

Nonrandom assignment of participants to conditions 
No unique conditions for participants 




























Chapter 2 Methods and Statistics in 1-0 Psychology 


Because of the increasing frequency of the use of the Internet for survey research, one 
might question whether online surveys and paper and pencil surveys produce equivalent 
results. Although differences in administration mode are not dramatic, it does appear that 
younger respondents prefer an online to a paper and pencil survey (Church, 2001). We 
will discuss the strengths and weaknesses of various research designs in greater detail in 
Chapter 7 when we consider the evaluation of training programs. 

Another interesting aspect of survey research relates to the characteristics of those who 
do not respond. Spitzmuller, Glenn, Barr, Rogelberg, and Daniel (2006) found that non¬ 
respondents tended to see their organization as less just and fair. Similarly, Spitzmuller, 
Glenn, Sutton, Barr, and Rogelberg (2007) found that non-respondents tended to be less 
willing to volunteer extra effort in their organization, less willing to engage in team activ¬ 
ities, and less courteous. These findings present an interesting problem: It has often been 
assumed that those who respond to a survey are not different from those who do not respond 
in any fundamental way—that is, in a way that would distort the conclusions drawn from 
the survey. But these results contradict that assumption, suggesting that we may only be 
able to draw conclusions about people who responded and that we may be unable to extend 
those conclusions to individuals who did not respond. 

The various research designs we have described in this chapter are not used with equal 
frequency. Schaubroeck and Kuehn (1992) found that 67 percent of published studies 
conducted by 1-0 psychologists were done in the field and 33 percent in a laboratory. 
Laboratory-based studies were usually experimental in design and used students as par¬ 
ticipants. Most field studies were not experimental and typically used employees as par¬ 
ticipants. In a follow-up study, Spector (2001) found very similar results. 

There are several reasons for the prevalence of nonexperimental field research in 1-0 
psychology. The first is the limited extent to which a laboratory experiment can reason¬ 
ably simulate “work” as it is experienced by a worker. The essence of laboratory research 
is control over conditions. This means that the work environment tends to be artificial 
and sterile, and the research deals with narrow aspects of behavior. Another, related rea¬ 
son is that experiments are difficult to do in the field because workers can seldom be ran¬ 
domly assigned to conditions or treatments. The goal of a real-life business organization 
is an economic one, not a scientific one. Finally, laboratory experiments often involve “sam¬ 
ples of convenience” (i.e., students) and there is considerable doubt that the behavior of 
student participants engaging in simulated work reasonably represents the behavior of actual 
workers. Laboratory studies provide excellent methods of control and are more likely to 
lead to causal explanations. Field studies permit researchers to study behaviors difficult 
to simulate in a laboratory, but cause-effect relationships are more difficult to examine 
in such field studies. 

Recently, there has been spirited debate (Landy, 2008a, b; Rudolph 8c Baltes, 2008; Wessel 
& Ryan, 2008) about the concept of stereotyping and its possible role in making work- 
related decisions. Social psychologists have argued that research on stereotyping shows that 
women, or older employees, or ethnic minority employees are treated more harshly when 
it comes to distributing work rewards such as pay increases and promotions. But the lion’s 
share of that research has been conducted with college students who are asked to make 
decisions about fictitious employees. Some organizational psychologists argue that oper¬ 
ational decisions (those involving real managers making decisions about their respective 
subordinates) provide the best opportunity to study the possible effects of stereotyping, 
not “artificial” decisions. Unlike the decisions made by managers, those that student par¬ 
ticipants make have no effect on a real person, are not made public to anyone, do not 
need to be defended to an employee, are made with little or no training or experience, 
and do not have the benefit of a deep and broad interaction history with the “employee.” 
Although it is easier and more convenient to do research with students, the conclusions 
drawn from such research may be too far removed from the real world of work to be of 
any enduring value. 


2.2 Research 


Methods of Data Collection 


Qualitative and Quantitative Research 

Historically, 1-0 psychology, and particularly the “I” part of I-O, has used quantitative 
methods for measuring important variables or behavior. Quantitative methods rely heav¬ 
ily on tests, rating scales, questionnaires, and physiological measures (Stone-Romero, 2002). 
They yield results in terms of numbers. They can be contrasted with more qualitative 
methods of investigation, which generally produce flow diagrams and narrative descrip¬ 
tions of events or processes, rather than “numbers” as measures. Qualitative methods include 
procedures like observation, interview, case study, and analysis of diaries or written docu¬ 
ments. The preference for quantitative over qualitative research can be attributed, at least 
in part, to the apparent preference of journal editors for quantitative research (Hemingway, 
2001), possibly because numbers and statistical analyses conform to a traditional view of 
science (Symon, Cassell, & Dickson, 2000). As an example, fewer than .3 percent of the 
articles published in the Journal of Applied Psychology since 1990 would be classified as 
r “qualitative” (Marchel & Owens, 2007). Today’s students may be surprised to know that 
in the early days of psychology, the “experimental method” was introspection, in which 
the participant was also the experimenter, who would record his or her experiences in com¬ 
pleting an experimental task. This method would be considered hopelessly subjective by 
today’s standards. Some (e.g., Marchel 8c Owens, 2007) have speculated that the found¬ 
ing fathers of psychology would be unable to find academic employment today! 

F You will notice that we described the issue as qualitative and quantitative research, as 
opposed to qualitative versus quantitative research. The two are not mutually exclusive 
: (Rogelberg, 2002). As an example of qualitative research, consider an extended observa¬ 
tion of a worker, which might include videotaped episodes of performance. That qualitative 
video record could easily be used to develop a frequency count of a particular behavior— 
a quantitative measure. Similarly related qualitative and quantitative research could be put 
into practice with an interview, or a diary kept by a worker, 
i Much of the resistance to qualitative research is the result of viewing it as excessively 
I subjective. This concern is misplaced. All methods of research ultimately require inter- 
1 pretation, regardless of whether they are quantitative or qualitative. The researcher is an 
explorer, trying to develop an understanding of the phenomenon he or she has chosen to 
investigate, and in so doing, should use all of the information available, regardless of its 
form. The key is in combining information from multiple sources to develop that theory. 
Bfogelberg and Brooks-Laber (2002) refer to this as triangulation—looking for converg¬ 
ing information from different sources. Detailed descriptions of qualitative research 
Pethods have been presented by Locke and Golden-Biddle (2002) and Bachiochi and Weiner 
(2002). Stone-Romero (2002) presents an excellent review of the variations of research 
Pssigns in 1-0 psychology, as well as their strengths and weaknesses. 

pe Importance of Context in Interpreting Research 

The added value of qualitative research is that it helps to identify the context for the beha¬ 
vior in question (Johns, 2001a). Most experiments control variables that might “com- 
Picate” the research and, in the process, eliminate “context.” In doing so, this control 
Can actually make the behavior in question less, not more, comprehensible. Consider the 
(Wowing examples. 

► 1. Allmendinger, Hackman, and Lehman (1996) studied 78 professional symphony 
f orchestras in four different countries and discovered that the greater the proportion 


Quantitative methods Rely 
on tests, rating scales, 
questionnaires, and 
physiological measures, and 
yield numerical results. 

Qualitative methods Rely on 
observation, interview, case 
study, and analysis of 
diaries or written 
documents and produce 
flow diagrams and narrative 
descriptions of events or 
processes. 

Introspection Early 
scientific method in which 
the participant was also the 
experimenter, recording his 
or her experiences in 
completing an experimental 
task: considered very 
subjective by modem 
standards. 


Triangulation Approach in 
which researchers seek 
converging information from 
different sources. 







Chapter 2 Methods and Statistics in 1-0 Psychology 

of female members of an orchestra, the lower the work motivation and job 
satisfaction of its members. It would have been tempting to conclude that women 
in symphony orchestras were disruptive. Instead, the authors dug beneath the “results” 
and discovered that the reasons for these findings were: (1) As women increasingly 
assumed roles that males had occupied before (e.g., percussionist instead of harpist), 
the older males became more upset; and (2) national culture made a big difference 
in the acceptance of women members. The U.S. and the U.K. were much more accept¬ 
ing of women members than West Germany and the former East Germany. 

2. A study of patient care teams directed by a nurse-manager found that there was a 
strong association between coaching, goal setting, team satisfaction, and the per¬ 
formance of the team as perceived by team members, and medical errors by the 
team. Unfortunately, however, the association was positive: the higher the ratings 
of each of first three elements, the greater the number of medical errors by that 
team! By collecting qualitative data through interviews and observations, the 
researchers were able to unravel this seeming mystery. It turned out that the most 
positive teams (more coaching, goal setting, satisfaction) were also those most will¬ 
ing to acknowledge errors and use them to learn, while the least positive teams (less 
coaching, fewer goals, lower satisfaction) covered up errors and did not learn from 
them (Hackman, 2003). 

3. A study of convenience stores found that those stores with less friendly sales 
persons had higher sales than the stores with more friendly sales staff members 
(Sutton & Rafaeli, 1988). Further investigation revealed that, because the less 
friendly stores were busier to start with, the staff had less time to be friendly. It was 
not that a nasty demeanor in a salesperson spurred sales. 

4. You have already been introduced to the Hawthorne studies. They were largely com¬ 
pleted by 1935. Nevertheless, controversy continues to surround their interpreta¬ 
tion (Olson, Verley, Santos, & Salas, 2004). At the simplest level, it appeared that 
simply paying attention to workers improved productivity. But things are not that 
simple. The studies were carried out during the Great Depression, when simply hav¬ 
ing a job—any job—was considered lifesaving. Additionally, the psychologist who 
described these studies to the popular press was an avowed anti-unionist (Griffin, 
Landy, & Mayocchi, 2002) and inclined to highlight any practice that contradicted 
the position of the union movement. If there were consistent productivity increases 
—and it is not clear that there were—these changes could not be understood with¬ 
out a broader appreciation for the context in which they occurred and were reported. 

In each of these examples, the critical variable was context. It was the situation in which 
the behavior was embedded that provided the explanation. Had the researchers not inves¬ 
tigated the context, each of these studies might have resulted in exactly the wrong policy 
change (i.e., don’t hire female orchestra members, don’t coach or set goals for medical 
teams, don’t hire friendly sales clerks). Context enhances the comprehensibility and, ulti¬ 
mately, the value of research findings. 

Generalizability and Control in Research 


Generalize To apply the 
results from one study or 
sample to other participants 
or situations. 


Generalizability 

One of the most important issues in conducting research is how widely the results can be 
generalized. There is a relatively simple answer to that question. An investigator can gener¬ 
alize results to areas that have been sampled in the research study. Consider Figure 2.1, 



CASE STUDY 2.1 TRIANGULATION: THE FINANCIAL CONSULTANT 


Job analysis Process that 
determines the important 
tasks of a job and the 
human attributes necessary 
to successfully perform 
those tasks. 


In Chapter 4, we will consider the topic of job 
analysis. Job analysis is a process used by 1-0 
psychologists to gain understanding of a job. 
It includes an investigation of the tasks and duties 
that define the job, the 
human attributes neces¬ 
sary to perform the job, 
and the context in which 
that job is performed. Job 
analysis typically involves 
the combination of data 
from many different sources 
in coming to a complete understanding, or theory, 
of the job in question. 

Consider the job of a financial consultant or 
stockbroker who advises individual private investors 
on how to invest their money. Large financial 
investment firms employ thousands of these finan¬ 
cial consultants to provide service to their high-end 
clients. Suppose you were hired as an 1-0 psycho¬ 
logist to study and “understand” the job of a finan¬ 
cial consultant with an eye toward developing a 
recruiting, selection, and training program for 
such individuals. 

How might you achieve this understanding? 
First, you might examine what the organization 
has to say about the job on its website and in 
its recruiting materials. Then you might talk with 
senior executives of the organization about the 
role the financial consultant plays in the success of 


the organization. Next you might tour the country 
interviewing and observing a sample of financial con¬ 
sultants as they do their work, in the office and out¬ 
side the office. You might also ask them to show 
you their daily appointment calendars and answer 
questions about the entries in these calendars. As 
part of this experience, you might spend several 
days with a single financial consultant and observe 
the variety of tasks he or she performs. Next you 
might interview the immediate managers of finan¬ 
cial consultants and explore their views of what 
strategies lead to success or failure for consultants. 

You might also interview retired financial con¬ 
sultants, as well as financial consultants who left their 
consulting positions with the company to become 
managers. Finally, you might ask a sample of finan¬ 
cial consultants and managers to complete a ques¬ 
tionnaire in which they rate the relative importance 
and frequency of the tasks that consultants perform, 
as well as the abilities and personal characteristics 
necessary to perform those tasks successfully. By 
gathering and interpreting this wealth of informa¬ 
tion, you will gain an excellent understanding of the 
job. Each of the methods of investigation gave you 
additional information. No one method was more 
important than any other method, and no method, 
alone, would have been sufficient to achieve an 
understanding of the position. This is the type of 
“triangulation” that Rogelberg and Brooks-Laber 
(2002) advocate. 


which is made up of concentric circles representing various factors or variables that might 
be sampled in a study. The first area for sampling might be participants or employees. 
If our research sample is representative of a larger population (e.g., all individuals who 
work for the organization and have a given job title), then we can feel more confident in 
i generalizing to this larger population of participants who might have been in our study. 

The next circle represents job titles. If the job titles of the participants in our study are a 
I representative sample of the population of job titles that exist in a particular company, 
then we can be more confident about generalizing to this larger population of jobs. The 
next circle represents time. If we have collected data at several different points in time, we 
t^n feel more confident in generalizing across time periods than we would if all the data 
cnme from one point in time. The final circle represents organizations. If we have collected 
°ur data from many different organizations, we can be more confident in extending our 
• findings beyond a single organization. 







Chapter 2 Methods and Statistics in 1-0 Psychology 



GURE 2.1 Sampling Domains for 1-0 Research 


Let’s take a concrete example. Suppose you conducted a 
research study to assess how well recent college graduates from 
the United States would adapt to working overseas. How would 
you maximize the generalizability of your conclusions? You 
might take the following steps: 

1. Sample graduates from many different educational institutions. 

2. Sample graduates from several different graduating classes. 

3. Sample graduates with degrees in a wide variety of majors. 

A. Sample graduates who work for many different companies. 

5. Sample graduates who work in many different depart¬ 
ments within those companies. 

6. Sample graduates assigned to many different countries 
outside the United States. 

If you were able to achieve this sampling, your results would 
be quite generalizable. But, of course, sampling as wide rang¬ 
ing as this is time consuming and expensive, so compromises are 
often made. And every time a compromise is made (e.g., data are 


gathered from graduates of only one institution, or from only one graduating class, only 


one major, or graduates who were assigned to only one country), the generalizability of 


the results is reduced. 


Finally, do not confuse sample size with sample representativeness. A large but non¬ 
representative sample is much less valuable for purposes of generalizability than a smaller 
but representative sample. 


Control 


Experimental control 
Characteristic of research in 
which possible confounding 
influences that might make 
results less reliable or 
harder to interpret are 
eliminated: often easier to 
establish in laboratory 
studies than in field studies. 


Statistical control Using 
statistical techniques to 
control for the influence of 
certain variables. Such 
control allows researchers 
to concentrate exclusively 
on the primary relationships 
of interest 


When research is conducted in the field, events and variables often can obscure the results. 
The primary reason why psychologists do laboratory studies, or experiments, is to elim¬ 
inate these distracting variables through experimental control. If you tried to study 
problem-solving behaviors among industrial workers at the workplace, you might find your 
study disrupted by telephone calls, machine breakdowns, missing team members, urgent 
requests from a supervisor. But if you conducted the same study in a laboratory, none of 
those distractions would be present. By using this form of control, you eliminate possible 
confounding influences that might make your results less reliable or harder to interpret. 

Unfortunately, the strength of experimental control is also its weakness. As we discussed 
earlier in the chapter, experimental control can make the task being studied sterile and 
reduce its practical value. Consider the physical trainer who employs exercise machines 
to isolate muscles versus the trainer who prefers to use free weights in training. The first 
trainer is exercising experimental control, whereas the second is not. The free weight trainer 
knows that in everyday life you cannot use isolated muscle groups. When a heavy box you 
are carrying shifts, you have to compensate for that shift with abs and back muscles and legs. 
Training without control (i.e., with free weights) prepares you for that real-life challenge; 
training with exercise machines may not. To return to our research situation, there are times 
when the lack of experimental control actually enhances what we learn about a behavior. 

There is another form of control that can be equally powerful. It is known as statistical 
control. As an example, suppose you wanted to study the relationship between job 
satisfaction and leadership styles in a company, and had at your disposal a representative 
sample of employees from many different departments, of both genders, of varying 
age, and of varying educational backgrounds. Suppose you were concerned that the rela¬ 
tionship of interest (job satisfaction and leadership style) might be obscured by other 
influences, such as the employees’ age, gender, educational level, or home department. 


2.2 Research 


65 


You could use statistical techniques to control for the influence of these other variables, 
[ allowing you to concentrate exclusively on the relationship between satisfaction and lead- 
| ership style. In 1-0 psychology, statistical control is much more common and more real- 
; istic than experimental control. 

Ethical Behavior in 1-0 Psychology 


[ Physicians swear to abide by the Hippocratic oath, whose first provision is to “do no harm.” 

This is the keystone of their promise to behave ethically. Most professions have ethical 
f standards that educate their members regarding appropriate and inappropriate behavior, 
and psychology is no exception. Every member of the American Psychological Association 
agrees to follow the ethical standards published by that governing body (APA, 2002). If a 
member violates a standard, he or she can be dropped from membership in the organ¬ 
ization. Although 1-0 psychologists do not have a separate code of ethics, SIOP has endorsed 
a collection of 61 cases that illustrate ethical issues likely to arise in situations that an 
1-0 psychologist might encounter (Lowman, 1998). In addition to the APA principles and 
the case book that SIOP provides, other societies (e.g., Academy of Management, 1990; 
Society for Human Resources Management, 1990) publish ethical standards that are 
relevant for 1-0 psychologists. 

Hr Formulating ethical guidelines for 1-0 psychologists can be very challenging because 
the work of an 1-0 psychologist is incredibly varied. Issues include personnel decisions, 
safety, organizational commitment, training, and motivation, to name but a few. The issues 
may be addressed as part of a consulting engagement, in-house job duties, or research. 
Because every situation is different, there is no simple formula for behaving ethically (although 
the “do no harm” standard never ceases to apply). The SIOP case book addresses topics 
as varied as testing, validity studies, result reporting, layoffs, sexual harassment, employee 
assistance programs, data collection, confidentiality, and billing practices. 

[ i Because this is your first course in 1-0 psychology, it is unlikely that you will be hired 
in the capacity of an 1-0 psychologist until you have completed a number of additional 
courses or received a graduate degree in 1-0 psychology. Nevertheless, if you are assist¬ 
ing an 1-0 psychologist in research or practice, or are simply interested in learning more 
about ethical guidelines of the profession, there are a number of sources that will prove 
valuable. Joel Lefkowitz (2003) has published a text on the broad issues of values and ethics 
1® 1-0 psychology. It covers psychological practice generally, as well as issues specific to 
10 psychology. Table 2.2 describes some of the ways in which an 1-0 psychologist can 
i contribute to ethical practice by organizations. 

0i A second valuable source for considering ethical issues in the practice of 1-0 psycho¬ 
logy is a column that appears regularly in TIP called “The 1-0 Ethicist.” This column 
presents concrete ethical dilemmas submitted by 1-0 psychologists. A distinguished and 
experienced panel of 1-0 psychologists provides answers to these ethical questions. Some 
of the issues raised in recent columns include: 

• A consultant developed and validated an employment screening test for a client, 
but was told that in spite of that professional development, the client intended to 
’ use an unvalidated and questionable screening device for hiring. The dilemma was 
' whether the psychologist, as a consultant, had any right or responsibility to object. 

• The panel agreed that the psychologist did have ethical responsibilities, including 
? warning the client of the risks in using the unvalidated procedure and making it 
j clear that the consultant could not defend the unvalidated procedure (Macey, 2004a). 
F* An 1-0 psychologist wondered if it was ethical to assess a potential American 
I. expatriate manager on issues such as stability of marriage and family life, since these 





Chapter 2 Methods and Statistics in 1-0 Psychology 


2.2 Research 


67 


TABLE 2.2 Potential Roles Available to the 1-0 Psychologist and Other Human Resource Managers with Respect to 
Ethical Problems 


ROLES 

DESCRIPTION 

Advisory 

Advising organizational members on ethical standards and policies 

Monitoring 

Monitoring actions/behaviors for compliance with laws, policies, and ethical standards 

Educator 

Instructing or distributing information regarding ethical principles and organizational policies 

Advocate 

Acting on behalf of individual employees or other organizational stakeholders, and protecting 
employees from managerial reprisals 

Investigative 

Investigating apparent or alleged unethical situations or complaints 

Questioning 

Questioning or challenging the ethical aspects of managers' decisions 

Organizational 

Explaining or justifying the organization's actions when confronted by agents external to the 
organization 

Model 

Modeling ethical practices to contribute to an organizational norm and climate of ethical behavior 

Source: Lefkowitz (2003). 


were not directly job related and might be construed as an invasion of the manager’s 
privacy. The dilemma was that research demonstrates that expatriate success often 
hinges on these issues. The panelists provided no clear answer to this dilemma, but 
there was some agreement that the best approach might be to inform the potential 
expatriate of the importance of a stable family and marriage, as the candidate might 
then be likely to self-select in or out based on that information (Macey, 2004b). 

• An 1-0 psychologist asked if it was ethical to hold or buy stock in a company for 
which the 1-0 psychologist consulted. The panel advised that there would be little 
concern if the psychologist purchased the stock before entering into the consulting 
relationship, but it would be less appropriate if the stock were purchased after 
the consultant relationship came into being. The major concern was that the 1-0 
psychologist might be privy to information that an outsider might not be. If that 
were the case, the psychologist might be violating not only an ethical standard of 
conflict of interest, but also the laws that govern insider trading. The panel was 
unanimous in agreeing that the psychologist should not purchase any stock of a 
client’s competitors because of clear conflicts of interest (Macey, 2004c). 

As more and more organizations expand their operations to include international and 
multinational business dealings, the ethical dilemmas for 1-0 psychologists will become 
much more complex. Suppose an 1-0 psychologist works for an organization that exploits 
cheap Third World labor in one of its divisions. How can the psychologist balance his or 
her duty to the employer and shareholder (i.e., enhance profitability) with the notion of 
doing no harm (Lefkowitz, 2004)? Suppose a psychologist is asked to design a leadership 
training program emphasizing the use of hierarchical and formal power for a culture that 
is low in power distance, or a motivational program based on interpersonal competition for 
a culture that would be characterized as noncompetitive (i.e., in Hofstede’s terms, feminine). 


Is it ethical to impose the culture of one nation on another through HR practices? There 
are no clear answers to these questions, but as the 1-0 psychologist expands his or her 
| influence to global applications, issues such as these will become more salient. As we will 
see in Chapter 12 on leadership, the key to organizational ethical behavior is the behavior of 
leaders at all levels in the organization from first line supervisors to CEOs (Lefkowitz, 2006). 

Although it is not universally true, unethical or immoral behavior is generally accom¬ 
panied by a clash between personal values and organizational goals in which the organ¬ 
izational goals prevail. Several business schools are now offering courses to MBA students 
that are intended to bring personal values and organizational behavior into closer align¬ 
ment (Alboher, 2008). Lefkowitz (2008) makes a similar argument, pointing out that although 
1-0 psychologists profess a strong commitment to understanding human behavior, they 
forget the human part of the equation in favor of the corporate economic objectives. He 
proposes that to be a true profession, 1-0 psychology needs to adopt and reflect societal 
[ responsibilities. As an example, rather than simply assisting in a downsizing effort, the 
1-0 psychologist should see downsizing in the larger context of unemployment and all of 
its consequences—both individual and societal. Seen from this perspective, the work by 
Start Carr and his colleagues in the arena of world poverty (mentioned in Chapter 1) rep¬ 
resents the moral and ethical high ground for 1-0 psychologists. Not every 1-0 psycho¬ 
logist need address poverty or hunger or illiteracy or global warming, but they should at 
least be aware that their interventions often have consequences that are considerably broader 
than the specific task they have undertaken. Instead of simply asking if a selection system 
is effective or if a motivational program is likely to increase productivity, the 1-0 psychologist 
might ask the broader question: Is this the right thing to do (Lefkowitz, 2008, p. 12)? 


MODULE 2.2 SUMMARY 






* 


• Research designs may be experimental, quasi- 
experimental, or nonexperimental; two com¬ 
mon nonexperimental designs are observation 
and survey. About two-thirds of 1-0 research uses 
j nonexperimental designs. 

Quantitative research yields results in terms of 
numbers, whereas qualitative research tends to 
produce flow diagrams and descriptions. The two 
are not mutually exclusive, however; the process 
of triangulation involves combining results 
from different sources, which may include both 
kinds of research. 


The results of research can be generalized to 
areas included in the study, thus, the more 
areas a study includes, the greater its gen- 
eralizability. Researchers eliminate distracting 
variables by using experimental and statistical 
controls. 

Ethical standards for 1-0 psychologists are set 
forth by the APA, SIOP, and other organizations 
such as the Society for Human Resource 
Management and the Academy of Manage¬ 
ment. The overriding ethical principle is “do no 
harm.” 






KEY TERMS 


research design 
experimental design 
quasi-experimental design 
nonexperimental design 
l observational design 


survey design 
quantitative methods 
qualitative methods 
introspection 
triangulation 


job analysis 
generalize 

experimental control 
statistical control 










2.3 Data Analysis 69 



Descriptive statistics 
Summarize, organize, and 
describe a sample of data. 

Measure of central tendency 
Statistic that indicates 
where the center of a 


are measures of central 
tendency. 

Variability The extent to 
which scores in a 
distribution vary. 

Skew The extent to which 
scores in a distribution are 
lopsided or tend to fall on 
the left or right side of the 
distribution. 

mean The arithmetic 
average of the scores in a 
distribution.- obtained by 
summing all of the scores 
in a distribution and 
dividing by the sample size. 

Mode The most common or 
frequently occurring score 



Descriptive and Inferential Statistics 


Descriptive Statistics 

In our discussion of research, we have considered two issues thus far: how to design a 
study to collect data, and how to collect those data. Assuming we have been successful at 
both of those tasks, we now need to analyze those data to determine what they may tell 
us about our initial theory, hypothesis, or speculation. We can analyze the data we have 
gathered for two purposes. The first is simply to describe the distribution of scores or num¬ 
bers we have collected. A distribution of numbers simply means that the numbers are arrayed 
along two axes. The horizontal axis is the score or number axis running from low to high 
scores. The vertical axis is usually the frequency axis, which indicates how many individ¬ 
uals achieved each score on the horizontal axis. The statistical methods to accomplish such 
a description are referred to as descriptive statistics. You have probably encountered this 
type of statistical analysis in other courses, so we will simply summarize the more import¬ 
ant characteristics for you. Consider the two distributions of test scores in Figure 2.2. Look 
at the overall shapes of those distributions. One distribution is high and narrow; the other 
is lower and wider. In the left graph, the distribution’s center (48) is easy to determine; 
in the right graph, the distribution’s center is not as clear unless we specify the central 
tendency measure of interest. One distribution is bell shaped or symmetric, while the other 
is lopsided. Three measures or characteristics can be used to describe any score distribu¬ 
tion: measures of central tendency, variability, and skew. Positive skew means that the 
scores or observations are bunched at the top of the score range; negative skew means that 
scores or observations are bunched at the bottom of the score range. As examples, if the 
next test you take in this course is very easy, there will be a positive skew to score distri¬ 
bution; if the test is very hard, the scores are likely to be negatively skewed. 

Measures of central tendency include the mean, the mode, and the median. The mean 
is the arithmetic average of the scores, the mode is the most frequently occurring score, 
and the median is the middle score (the score that 50 percent of the remaining scores fall 
above, and the other 50 percent of the remaining scores fall below). As you can see, the 
two distributions have different means, modes, and medians. In addition, the two dis¬ 
tributions vary on their lopsidedness or skewness. The left distribution has no skew; the right 








FIGURE 2.3 Two Score Distributions W = 10) 

distribution is positively skewed, with some high 
scores pulling the mean to the positive (right) side. 
P' J Another common descriptive statistic is the 
standard deviation or the variance of a distribution. 
In Figure 2.3, you can see that one distribution 
covers a larger score range and is wider than the 
other. We can characterize a distribution by look¬ 
ing at the extent to which the scores deviate from 
the mean score. The typical amount of deviation 
from a mean score is the standard deviation. Since 
[ distributions often vary from each other simply as 
a result of the units of measure (e.g., one distribu- 
fi°n is a measure of inches, while another is a 
measure of loudness), sometimes it is desirable to 
tndardize the distribution so that they all have 



Source: © The New Yorker Collection 1989 Mick Stevens from cartoonbank.com. All rights reserved. 




















Chapter 2 Methods and Statistics in 1-0 Psychology 


Inferential statistics Used 
to aid the researcher in 
testing hypotheses and 
making inferences from 
sample data to a larger 
sample or population. 


Statistical significance 
Indicates that the probability 
of the observed statistic is 
less than the stated 
significance level adopted 
by the researcher 
(commonly yr<.05). A 
statistically significant 
finding indicates that if the 
null hypothesis were true, 
the results found are 
unlikely to occur by chance, 
and the null hypothesis is 
rejected. 


means of .00 and standard (or average) deviations of 1.00. The variance of a distribution 
is simply the squared standard deviation. 

Inferential Statistics 

In the studies that you will encounter in the rest of this text, the types of analyses used 
are not descriptive, but inferential. When we conduct a research study, we do it for a 
reason. We have a theory or hypothesis to examine. It may be a hypothesis that accidents 
are related to personality characteristics, or that people with higher mental ability test scores 
perform their jobs better than those with lower scores, or that team members in small 
teams are happier with their work than team members in large teams. In each of these 
cases, we design a study and collect data in order to come to some conclusion, to draw 
an inference about a relationship. Once again, in other courses, you have likely been intro¬ 
duced to some basic inferential statistics. Statistical tests such as the t test, analysis of vari¬ 
ance or F test, or chi-square test can be used to see whether two or more groups of participants 
(e.g., an experimental and a control group) tend to differ on some variable of interest. 
For example, we can examine the means of the two groups of scores in Figure 2.3 to see 
if they are different beyond what we might expect as a result of chance. If I tell you that 
the group with the lower mean score represents high school graduates and the group with 
the higher mean score represents college graduates, and I further tell you that the means 
are statistically significantly different from what would be found with simple random or 
chance variation, you might draw the inference that education is associated with higher 
test scores. The statistical test used to support that conclusion (e.g., a t test of mean dif¬ 
ferences) would be considered an inferential test. 

Statistical Significance 

Two scores, derived from two different groups, might be different, even at the third de¬ 
cimal place. How can we be sure that the difference is a “real” one—that it exceeds a dif¬ 
ference we might expect as a function of chance alone? If we examined the mean scores 
of many different test groups, such as the two displayed in Figure 2.3, we would almost 
never find that the means were exactly the same. A convention has been adopted to define 
when a difference or an inferential statistic is significant. Statistical significance is defined 
in terms of a probability statement. To say that a finding of difference between two groups 
is significant at the 5 percent level, or a probability of .05, is to say that a difference that 
large would be expected to occur only 5 times out of 100 as a result of chance alone. If 
the difference between the means was even larger, we might conclude that a difference 
this large might be expected to occur only 1 time out of 100 as a result of chance alone. 
This latter result would be reported as a difference at the 1 percent level, or a probability 
of .01. As the probability goes down (e.g., from .05 to .01), we become more confident 
that the difference is a real difference. It is important to keep in mind that the significance 
level addresses only the confidence that we can have that a result is not due to chance. It 
says nothing about the strength of an association or the practical importance of the result. 
The standard, or threshold, for significance has been set at .05 or lower as a rule of thumb. 
Thus, unless a result would occur only five or fewer times out of a hundred as a result of 
chance alone, we do not label the difference as statistically significant. 

In recent years 1-0 psychologists have vigorously debated the value of significance test¬ 
ing (Murphy, 1997; Schmidt, 1996; Bonett & Wright, 2007). Critics argue that the .05 level 
is arbitrary and that even higher levels—up to .10, for example—can still have practical 
significance. This is a very complex statistical debate that we will not cover here, but it is 
important to remember that you are unlikely to encounter studies in scientific journals 
with probability values higher than .05. The critics argue that there are many interesting 


2.3 Data Analysis 


findings and theories that never see the scientific light of day because of this arbitrary 
. criterion for statistical significance. 

The Concept of Statistical Power 

> Many studies have a very small number of participants in them. This makes it very difficult 
to find statistical significance even when there is a “true” relationship among variables. In 
Figure 2.3 we have reduced our two samples in Figure 2.2 from 30 to 10 by randomly 
t dropping 20 participants from each group. The differences are no longer statistically signifi¬ 
cant. But from our original study with 30 participants, we know that the differences between 
‘ means are not due to chance. Nevertheless, the convention we have adopted for defining 
[ significance prevents us from considering the new difference to be significant, even 
though the mean values and the differences between those means are identical to what 
they were in Figure 2.2. 

I The concept of statistical power deals with the likelihood of finding a statistically significant 
r difference when a true difference exists. The smaller the sample size, the lower the power 
to detect a true or real difference. In practice, this means that researchers may be draw¬ 
ing the wrong inferences (e.g., that there is no association) when sample sizes are too small. 
The issue of power is often used by the critics of significance testing to illustrate what is 
wrong with such conventions. Schmidt and Hunter (2002b) argued that the typical power 
of a psychological study is low enough that more than 50 percent of the studies in the lit- 
: erature do not detect a difference between groups or an effect of a treatment or independ¬ 
ent variable on a dependent variable when one exists. Thus, adopting a convention that 
requires an effect to be “statistically significant” at the .05 level greatly distorts what we 
read in journals and how we interpret what we do read. 

E Power calculations can be done before a study is ever initiated, informing the researcher 
of the number of participants that should be included in the study in order to have a 
reasonable chance of detecting an association (Cohen, 1988,1994; Murphy & Myors, 1998). 
Research studies can be time consuming and expensive. It would be silly to conduct a study 
that could not detect an association even if one were there. The power concept also pro¬ 
vides a warning against casually dismissing studies that do not achieve “statistical signifi¬ 
cance” before looking at sample sizes. If the sample sizes are small, we may never know 
whether or not there is a real effect or difference between groups. 


Statistical power The 
likelihood of finding a 
statistically significant 
difference when a true 


Correlation and Regression 


As we saw in the discussion about research design, there are many situations in which 
experiments are not feasible. This is particularly true in 1-0 psychology. It would be uneth- 
A ical, for example, to manipulate a variable that would influence well-being at work, with 
;- some conditions expected to reduce well-being and others to enhance well-being. The most 

■ common form of research is to observe and measure natural variation in the variables of 

■ interest and look for associations among those variables. Through the process of mea¬ 
surement, we can assign numbers to individuals. These numbers represent the person’s 

H standing on a variable of interest. Examples of these numbers are a test score, an index of 
B~ stress or job satisfaction, a performance rating, or a grade in a training program. We may 
wish to examine the relationship between two of these variables to predict one variable 
from the other. For example, if we are interested in the association between an individual’s 
cognitive ability and training success, we can calculate the association between those two 
variables for a group of participants. If the association is statistically significant, then we 
can predict training success from cognitive ability. The stronger the association between the 


Measurement Assigning 
numbers to characteristics 
of individuals or objects 
according to rules. 







ning Grade 


Chapter 2 Methods and Statistics in 1-0 Psychology 



two variables, the better the prediction we are able to make 
/ from one variable to another. The statistic or measure 
. \ of association most commonly used is the correlation 

-*—■'“3^’T■ coefficient. 

■Mgl * i The Concept of Correlation 

[Sfcffi The best way to appreciate the concept of correlation is 
graphically. Look at the hypothetical data in Figure 2.4. 
||P The vertical axis of that graph represents training grades. 
The horizontal axis represents a score on a test of cog- 
ifexj- ■ -1 nitive ability. For both axes, higher numbers represent 

higher scores. This graph is called a scatterplot because 
:ognitive Test Score it plots the scatter of the scores. Each dot represents the 

two scores achieved by an individual. The 40 dots repre- 
E 2 4 CorTela,io " between Tes * Scores and Trainin ® Grades sent 40 people. Notice the association between test scores 

and training grades. As test scores increase, training 
grades tend to increase as well. In high school algebra, this association would have been 
noted as the slope or “rise over run,” meaning how much rise (increase on the vertical 
axis) is associated with one unit of run (increase on the horizontal axis). In statistics, the 
name for this form of association is correlation, and the index of correlation or associ¬ 
ation is called the correlation coefficient. You will also notice that we have drawn a solid 
straight line that goes through the scatterplot. This line (technically known as the regres¬ 
sion line) is the straight line that best “fits” the scatterplot. The line can also be presented 
as an equation that specifies where the line intersects the vertical axis and what the angle 
or slope of the line is. 

As you can see from Figure 2.4, the actual angle of the line that depicts the association 
is influenced by the units of measurement. If we plotted training grades against years of 
formal education, the angle or slope of the line might look quite different, as is depicted 
in Figure 2.5, where the slope of the line is much less steep or severe. For practical pur¬ 
poses, the regression line can be quite useful. It can be used to predict what value on the 
Y variable (in Figure 2.4, training grades) might be expected for someone with a particu¬ 
lar score on the X variable (here, ability test scores). Using the scatterplot that appears in 
Figure 2.4, we might predict that an individual who achieved a test score of 75 could be 
expected to also get a training grade of 75 percent. We might use that prediction to make 
decisions about whom to enroll in a training program. Since we would not want to enroll 
someone who might be expected to fail the training program (in our case, receive a train¬ 
ing grade of less than 60 percent), we might limit enrollment to only those applicants who 
achieve a score of 54 or better on the cognitive ability test. 

The Correlation Coefficient 

. . . For ease of communication and for purposes of further 

ii ♦ lll ^ |IT1 M r rfi|i' 1 ' : analysis, the correlation coefficient is calculated in such 

a way that it always permits the same inference, regard- 
less of the variables that are used. Its absolute value will 
h : always range between .00 and 1.00. A high value (e.g., 

Nr .85) represents a strong association and a lower value 

(e.g., .15) represents a weaker association. A value of .00 
means that there is no association between two variables, 
fears of Education Generally speaking, in 1-0 psychology, correlations in the 

range of .10 are considered close to trivial, while corre- 
Years of Education and Training Grades lations of .40 or above are considered substantial. 


Correlation coefficient 
Statistic assessing the 
bivariate, linear association 
between two variables. 
Provides information about 
both the magnitude 
(numerical value) and the 
direction (+or-) of the 
relationship between two 
variables. 

Scatterplot Graph used to 
plot the scatter of scores on 
two variables: used to 
display the correlational 
relationship between two 
variables. 

Regression line Straight 
line that best "fits 'the 
scatterplot and describes 
the relationship between the 
variables in the graph: can 
also be presented as an 
equation that specifies 
where the line intersects the 
vertical axis and what the 
angle or slope of the line is. 



5 Correlation between Years of Education and Training Grades 


2.3 Data Analysis 


73 



FIGURE 2.6 Scatterplots Representing Various Degrees of Correlation 


■ Correlation coefficients have two distinct parts. The first part is the actual value or mag¬ 
nitude of the correlation (ranging from .00 to 1.00). The second part is the sign (+ or -) 
that precedes the numerical value. A positive (+) correlation means that there is a posit¬ 
ive association between the variables. In our examples, as test scores and years of educa¬ 
tion go up, so do training grades. A negative (-) correlation means that as one variable 


goes up, the other variable tends to go down. An example of a negative correlation would 
be the association between age and visual acuity. As people get older, their uncorrected 
vision tends to get worse. In 1-0 psychology, we often find negative correlations between 
measures of commitment and absence from work. As commitment goes up, absence tends 
to go down, and vice versa. Figure 2.6 presents examples of the scatterplots that represent 
various degrees of positive and negative correlation. You will notice that we have again 
drawn straight lines to indicate the best fit straight line that represents the data points. 

I By examining the scatterplots and the corresponding regression fines, you will notice 
something else about correlation. As the data points more closely approach the straight 
fine, the correlation coefficients get higher. If all of the data points fell exactly on the fine, 
the correlation coefficient would be 1.00 and there would be a “perfect” correlation 
between the two variables. We would be able to perfectly predict one variable from 
another. As the data points depart more from the straight fine, the correlation coefficient 
gets lower until it reaches .00, indicating no relationship at all between the two variables. 

Up to this point, we have been assuming that the relationship between two variables is 
linear (i.e., it can be depicted by a straight fine). But the relationship might be nonlinear 
(sometimes called “curvilinear”). Consider the scatterplot depicted in Figure 2.7. In this 
case a straight line does not represent the shape of the scatterplot at all. But a curved line 
does an excellent job. In this case, although the correlation coefficient might be .00, one 
cannot conclude that there is no association between the variables. We can conclude only 
that there is no linear association. 


Linear Relationship 
between two variables that 
can be depicted by a 
straight line. 

Nonlinear Relationship 
between two variables that 
cannot be depicted by a 
straight tine: sometimes 
called "curvilinear” and 
most easily identified by 
examining a scatterplot. 












Chapter 2 Methods and Statistics in 1-0 Psychology 


Multiple correlation 
coefficient Statistic that 
represents the overall linear 
association between several 
variables (e.g.. cognitive 
ability, personality, 
experience) on the one 
hand, and a single variable 
(e.g.. job performance) on 
the other hand. 


In this figure we have identified the two variables in 
’ . ’ . <;< question as “stimulation” and “performance.” This scat- 

WV terplot would tell us that stimulation and performance 

, .y''" . ’. .* t _ i . are related to each other, but in a unique way. Up to a point, 

• ‘ • | stimulation aids in successful performance by keeping the 

/*«* •••X , employee alert, awake, and engaged. But beyond that point, 

* A • stimulation makes performance more difficult by turn- 

I AT • ing into information overload, which makes it difficult 

• • to keep track of relevant information and to choose 

appropriate actions. Most statistics texts that deal with 
correlation offer detailed descriptions of the methods for 
Stimulation calculating the strength of a nonlinear correlation or 

association. But for the purposes of the present discus- 
le of a Curvilinear Relationship s i OI1) y 0 u merely need to know that one of the best ways 

to detect nonlinear relationships is to look at the scat- 
terplots. As in Figure 2.7, this nonlinear trend will be very apparent if it is strong one. In 
1-0 psychology, many if not most of the associations that interest us are linear. 

Multiple Correlation 

As we will see in later chapters, there are many situations in which more than one vari¬ 
able is associated with a particular aspect of behavior. For example, you will see that although 
cognitive ability is an important predictor of job performance, it is not the only predic¬ 
tor. Job performance is multiply determined. Other variables that might play a role are 
personality, experience, and motivation. If we were trying to predict job performance, 
we would want to examine the correlation between performance and all of those vari¬ 
ables simultaneously, allowing for the fact that each variable might make an independent 
contribution to understanding job performance. Statistically, we could accomplish this 
through an analysis known as multiple correlation. The multiple correlation coefficient 
* would represent the overall linear association between several variables (e.g., cognitive abil- 
■veral P ersona * it y> experience, motivation) on the one hand, and a single variable (e.g., job 

jg performance) on the other hand. As you can imagine, these calculations are so complex 
that their study is appropriate for an advanced course in prediction or statistics. For our 
purposes in this text, you will simply want to be aware that techniques are available for 
aWe examining relationships involving multiple predictor variables. 


Correlation and Causation 


Correlation coefficients simply represent the extent to which two variables are associated. 
They do not signal any cause-effect relationship. Consider the example of height and weight. 
They are positively correlated. The taller you are, the heavier you tend to be. But you would 
hardly conclude that weight causes height. If that were the case, we could all be as tall as 
we wish simply by gaining weight. 

In an earlier section of this chapter that dealt with the context of research results, 
we described the anomalous finding that better functioning medical teams appeared to be 
associated with more medical errors. Would it make sense, then, to retain only poorer 
functioning teams? Similarly, we gave the example of less friendly sales personnel in 
convenience stores being associated with higher sales. Would it make sense to fire pleas¬ 
ant sales reps? In both cases, it was eventually discovered that a third variable intervened 
to help us understand the surprising correlation. It became clear that the initial associ¬ 
ation uncovered was not a causal one. 


2.3 Data Analysis 75 





< It is not always easy to separate causes and effects. 
$ The experimental design that you use often deter- 
L mines what conclusions you can draw. A story is 
V, told of the researcher who interviewed the inhabit- 
i ants of a particular neighborhood. He noted that 
>_ the young people spoke fluent English. In speaking 
i with the middle-aged people who would be the 
’ parent generation of the younger people, he found 


resent the grandparent generation of the youngest 
group) and heard a heavy Italian accent. The 
researcher concluded that as you grow older, you 
develop an Italian accent. It is a safe bet that had 
the researcher studied a group of people as they aged, 
he would have come to a very different conclusion, 
perhaps even an opposite one. 


r b that they spoke English with a slight Italian accent. 
Il Finally, he spoke with older people (who would rep- 


Source: Adapted from Charness (1985), p. xvii. 


The question of correlation and causality has important bear¬ 
ing on many of the topics that we will consider in this book. For 
K example, there are many studies that show a positive correlation 
; t between the extent to which a leader acts in a considerate manner 
and the satisfaction of the subordinates of that leader. Because 
of this correlation, we might be tempted to conclude that con¬ 
s' sideration causes satisfaction. But we might be wrong. Consider 
two possible alternative explanations for the positive correlation. 

Hfr 1. Do we know that considerate behavior on the part of a 
i) business leader causes worker satisfaction rather than the 
other way around? It is possible that satisfied subordinates 
i actually elicit considerate behavior on the part of a leader 
: (and conversely, that a leader might “crack down” on dis- 

’ satisfied work group members). 

W. 2. Can we be sure that the positive correlation is not due 
to a third variable? What if work group productivity were 
high because of a particularly able and motivated group? 

High levels of productivity are likely to be associated with 
r satisfaction in workers, and high levels of productivity are 

likely to allow a leader to concentrate on considerate behaviors instead of pressur¬ 
ing workers for higher production. Thus, a third variable might actually be respon¬ 
sible for the positive correlation between two other variables. 



Meta-Analysis 


| Cancer researchers, clinicians, and patient advocates have engaged in a vigorous debate 
over whether women aged 40 to 70 can decrease their chances of dying from breast 
cancer by having an annual mammogram. One expert asserts that the earlier cancer can 

( be detected, the greater the chance of a cure, and that an annual mammogram is the only 
reliable means of early detection. Another argues that this is not necessarily true and, fur¬ 
thermore, because mammograms deliver potentially harmful radiation, they should be used 









Chapter 2 Methods and Statistics in 1-0 Psychology 


Meta-analysis Statistical 
method for combining and 
analyzing the results from 
many studies to draw a 
general conclusion about 
relationships among 
variables. 

Statistical artifacts 
Characteristics (e.g.. small 
sample size, unreliable 
measures) of a particular 
study that distort the 
observed results. 
Researchers can correct for 
artifacts to arrive at a 
statistic that represents the 
"true” relationship between 
Die variables of interest 


only every two or three years unless a patient has significant risk factors for the disease. 
Still another says that mammograms give a false sense of security and may discourage patients 
from monitoring their own health. Experts on all sides cite multiple studies to support 
their position. And women are left with an agonizing dilemma: Who is right? What is the 
“truth”? 

As you will see when you wade into the 1-0 research literature, similar confusion exists 
over the interpretation of study results in psychology topics. You may find hundreds of 
studies on the same topic. Each study is done with a different sample, a different sample 
size, and a different observational or experimental environment. By 1976, for example, 
more than 3,000 studies of job satisfaction had been contributed to the literature (Locke, 
1986), a number that now is likely to be closer to 5,000. It is not uncommon for indi¬ 
vidual studies to come to different conclusions. For example, one study of the relationship 
between age and job satisfaction may have administered a locally developed satisfaction 
questionnaire to 96 engineers employed by Company X between the ages of 45 and 57. 
The study might have found a very slight positive correlation (e.g., +.12) between age and 
satisfaction. Another study might have distributed a commercially available satisfaction 
questionnaire to 855 managerial level employees of Company Y between the ages of 27 
and 64. The second study might have concluded that there was a strong positive correla¬ 
tion (e.g., +.56) between age and satisfaction. A third study of 44 outside sales represen¬ 
tatives for Company Z between the ages of 22 and 37 using the same commercially available 
satisfaction questionnaire might have found a slight negative correlation between age and 
satisfaction (e.g., -.15). Which study is “right”? How can we choose among them? 

Meta-analysis is a statistical method for combining results from many studies to draw 
a general conclusion (Murphy, 2002b; Schmidt & Hunter, 2002a). Meta-analysis is based 
on the premise that observed values (like the three correlations shown above) are influenced 
by statistical artifacts (characteristics of the particular study that distort the results). The 
most influential of these artifacts is sample size. Others include the spread of scores and 
the reliability of the measures used (“reliability” is a technical term that refers to the con¬ 
sistency or repeatability of a measurement; we will discuss it in the next section of this 
chapter). Consider the three hypothetical studies we presented above. One had a sample 
size of 96, the second of 855, and the third of 44. Consider also the range of scores on age 
for the three studies. The first had an age range from 45 to 57 (12 years). The second study 
had participants who ranged in age from 27 to 64 (37 years). The participants in the third 
study ranged from 22 to 37 years of age (15 years, with no “older” employees). Finally, 
two of the studies used commercially available satisfaction questionnaires, which very likely 
had high reliability, and the third study used a “locally developed” questionnaire which 
may have been less reliable. Using these three studies as examples, we would probably have 
greater confidence in the study with 855 participants, with an age range of 37 years, and 
that used a more reliable questionnaire. Nevertheless, the other studies tell us something. 
We’re just not sure what that something is, because of the influences of the restricted age 
ranges, the sample sizes, and the reliabilities of the questionnaires. 

In its most basic form, meta-analysis is a complex statistical procedure that includes 
information about these statistical artifacts (sample size, reliability, and range restriction) 
and corrects for their influences, producing an estimate of what the actual relationship is 
in the studies available. But it is possible to include variables beyond these statistical arti¬ 
facts that might also influence results. A good example of such a variable is the nature of 
participants in the study. Some studies might conclude that racial or gender stereotypes 
influence performance ratings, while other studies conclude that there are no such effects. 
If we separate the studies into those done with student participants and those done with 
employees of companies, we might discover that stereotypes have a strong influence on 
student ratings of hypothetical subordinates, but that stereotypes have no influence on 
the ratings of real subordinates by real supervisors. 


2.3 Data Analysis 77 


■fcMeta-analysis can be a very powerful research tool. It combines individual studies that 
have already been completed and, by virtue of the number and diversity of these studies, 
has the potential to “liberate” conclusions that were obscure or confusing at the level of 
the individual study. Meta-analyses are appearing with greater regularity in the 1-0 jour¬ 
nals and represent a real step forward in 1-0 research. Between 1980 and 1990, approxim¬ 
ately 60 research articles in 1-0 journals used meta-analytic techniques. Between 1991 
and 2008, that number had grown to almost 400. The appearance of meta-analysis in all 
psychology journals is just as dramatic. Between 1980 and 1990, approximately 1,100 articles 
in the general psychology literature used meta-analysis techniques. Between 1990 and 2008, 
that number had grown to over 7,800. The actual statistical issues involved in meta-analysis 
are incredibly complex, are becoming more complex every day, and are well beyond what you 
need to know for this course. Nevertheless, because meta-analysis is becoming so common, 
you at least need to be familiar with the term. As an example, we will examine the appli¬ 
cation of meta-analysis to the relationship between tests and job performance in Chapter 3. 

Micro-, Macro-, and Meso-Research 


In the same spirit by which we introduce you to the term meta-analysis, we need to pre¬ 
pare you for several other terms you may encounter while reading the research literature, 
particularly the literature associated with organizational topics in the last few chapters of 
this book. Over the 100-plus years of the development of 1-0 psychology as a science and 
area of practice, there has been an evolution of areas of interest from the individual dif¬ 
ferences studied by psychometricians to much broader issues related to teams, groups, and 
entire organizations. In later chapters, you will encounter topics such as team training, 
group cohesiveness, and organizational culture and climate. In our discussion of Hofstede 
(2001), Chao and Moon (2005), and others in Chapter 1, we have already introduced you 
to a very broad level of influence called national culture. As a way of characterizing the 
research focus of those who are more interested in individual behavior as opposed to those 
more interested in the behavior of collections of individuals (e.g., teams, departments, organ¬ 
izations), the terms micro-research and macro-research were introduced, with micro being Micro-research The study 
applied to individual behavior and macro being applied to collective behavior (Smith, ^ individual behavior. 
Schneider, 8c Dickson, 2005). But it is obvious that even individual behavior (e.g., job Macro-research The study 
satisfaction) can be influenced by collective variables (e.g., group or team cohesion, repu- of collective behavior, 
tation of the employer, an organizational culture of openness). As a result, a third term— Heso-research The study 

meso-research (meso literally means “middle” or “between”)—was introduced to both of the interaction of 
describe and encourage research intended to integrate micro- and macro-studies (Capelli individual and collective 
& Sherer, 1991; House, Rousseau, 8c Thomas-Hunt, 1995; Rousseau 8c House, 1994). behavior. 

K In practice, meso-research is accomplished by including both individual differences data 
(c-g, cognitive ability test scores) and collective data (the technological emphasis of the 
company, the team culture, etc.) in the same analysis. This type of analysis is known as 
multi-level or cross-level analysis (Klein 8c Kozlowski, 2000) and is much too complex for 
, a discussion in a basic text such as this. Nevertheless, you will want to be aware that meso- 
research is becoming much more common for many of the same reasons we described in 
the consideration of “context” earlier in the chapter. Behavior in organizations cannot be 
neatly compartmentalized into either micro or macro levels. There are many influences 
that cut across levels of analysis. 

Many important questions about the experience of work require such a multi-level con¬ 
sideration (Drenth 8c Heller, 2004). Even though we don’t expect you to master the ana¬ 
lytic techniques of multi-level research, you should at least be able to recognize these terms 
and understand at a basic level what they are meant to convey. Hitt, Beamish, Jackson, 








Chapter 2 Methods and Statistics in 1-0 Psychology 


and Mathieu (2007) provide an example of a multi-level phenomenon embedded in the 
universal organizational concept of “strategy.” Although the leaders of the organization 
may very well adopt a “strategy” for building the business plan for the coming year, little 
is known about how that strategy is communicated to the various lower levels in the organ¬ 
ization or, more importantly, how that “strategy” will change the behavior of the work¬ 
ers and supervisors from the behavior they had exhibited last year. Hitt and colleagues 
(2007) suggest that the connection between a firm’s strategy and that firm’s performance 
can only be understood in the context of several levels of behavior, extending from the 
executive boardroom to the production floor. As we will see in the final chapter of this 
book, another excellent example of the value of multi-level considerations is workplace 
safety. Safe behavior results from an intricate combination of individual worker charac¬ 
teristics (e.g., knowledge of how to work safely and abilities to work safely), work team 
influences (the extent to which team members reinforce safe work behavior in each 
other), leader behavior (the extent to which the work group leader adopts and reinforces 
safe work behavior), and the extent to which senior leaders of the organization acknow¬ 
ledge the importance of safe work behavior (Wallace & Chen, 2006). 



\ 


MODULE 2.3 SUMMARY 


• Descriptive statistics are expressed in terms of 
absolute values without interpretation. Inferential 
statistics allow a researcher to identify a rela¬ 
tionship between variables. The threshold for 
statistical significance is .05, or 5 occurrences out 
of 100. Statistical power comes from using a 
large enough sample to make results reliable. 

• A statistical index that can be used to estimate 
the strength of a linear relationship between 
two variables is called a correlation coefficient. 
The relationship can also be described graph¬ 
ically, in which case a regression line can be 
drawn to illustrate the relationship. A multiple 
correlation coefficient indicates the strength of 
the relationship between one variable and the 
composite of several other variables. 


• Correlation is a means of describing a relation¬ 
ship between two variables. When examining any 
observed relationship and before drawing any 
causal inferences, the researcher must consider 
whether the relationship is due to a third vari¬ 
able or whether the second variable is causing the 
first rather than vice versa. 

• Meta-analysis, the statistical analysis of multiple 
studies, is a powerful means of estimating rela¬ 
tionships in those studies. It is a complex statis¬ 
tical procedure that includes information about 
statistical artifacts and other variables, and cor¬ 
rects for their influences. 


KEY TERMS 


descriptive statistics 

measure of central tendency 

variability 

skew 

mean 

mode 

median 

inferential statistics 


statistical significance 
statistical power 
measurement 
correlation coefficient 
scatterplot 
regression line 
linear 
nonlinear 


multiple correlation coefficient 

meta-analysis 

statistical artifacts 

micro-research 

macro-research 

meso-research 


So far, we have considered the scientific method, the design of research studies, the 
collection of data, and the statistical analyses of data. All of these procedures prepare us 
for the most important part of research and application: the interpretation of the data based 
on the statistical analyses. The job of the psychologist is to make sense ou ^ °f what he or 
she sees. Data collection and analysis are certainly the foundations of making sense, but 
data do not make sense of themselves, they require someone to interpret them. 

I Any measurement that we take is a sample of some behavioral domain. A test of 
I reasoning ability, a questionnaire related to satisfaction or stress, and a training grade are 
all samples of some larger behavioral domain. We hope that these samples are consistent, 

I accurate, and representative of the domains of interest. If they are, then we can make accur¬ 
ate inferences based on these measurements. If they are not, our inferences, and ultimately 
our decisions, will be flawed, regardless of whether the decision is to hire someone, insti¬ 
tute a new motivation program, or initiate a stress reduction program. We use measure¬ 
ment to assist in decision making. Because a sample of behavior is just that—an example 
of a type of behavior but not a complete assessment—all samples, by definition, are incom¬ 
plete or imperfect. So we are always in a position of having to draw inferences or make 
decisions based on incomplete or imperfect measurements. The challenge is to make sure 
that the measurements are “complete enough” or “perfect enough for our purposes^ 
The technical terms for these characteristics of measurement are reliability and validity. 
If a measure is unreliable, we would get different values each time we sampled the 
behavior. If a measure is not valid, we are gathering incomplete or inaccurate informa¬ 
tion. Although the terms “reliability” and “validity” are most often appliedtoest scores 
they could be applied to any measure. We must expect reliability and vahdity from any 
, measure that we will use to infer something about the behavior of an individual. This includes 
f- surveys or questionnaires, behavioral measures such as counts of production, interview 
| responses, performance evaluation ratings, and test scores. 


Reliability Consistency or 
stability of a measure. 
Validity The accurateness 
of inferences made based 
on test or performance 
data: also addresses 
whether a measure 
accurately and completely 
represents what was 
intended to be measured. 


Reliability 


ft When we say that someone is “reliable, w< 
ft on, someone predictable and consistent, 
| ask for it. The same is true of measures. 


e mean that he or she is someone we can count 
someone we can depend on for help if we 
We need to feel confident that if we took the 








Chapter 2 Methods and Statistics in 1-0 Psychology 


measure again, at a different time, or if someone else took the measurement, the value 
would remain the same. Suppose you went for a physical and before you saw the doctor, 
the nurse took your temperature and found it to be 98.6. If the doctor came in five 
minutes later and retook your temperature and reported that it was 101.5, you would be 
surprised. You would have expected those readings to agree, given the short time span 
between measurements. With a discrepancy this large, you would wonder about the skill 
of the nurse, the skill of the doctor, or the adequacy of the thermometer. In technical terms, 
you would wonder about the reliability of that measure. 

Test-Retest Reliability 


Test-retest reliability 
Calculated by correlating 
measurements taken at time 
one with measurements 
taken at time two. 


There are several different aspects to measurement reliability. One aspect is simply the 
temporal consistency—the consistency over time—of a measure. Would we have gotten 
the same value had we taken the measurement next week as opposed to this week, or next 
month rather than this month? If we set out to measure someone’s memory skills and this 
week find that they are quite good, but upon retesting the same person next week we find 
that they are quite poor, what do we conclude? Does the participant have a good mem¬ 
ory or not? Generally speaking, we want our measures to produce the same value over a 
reasonable time period. This type of reliability, known as test-retest reliability, is often 
calculated as a correlation coefficient between measurements taken at time one and mea¬ 
surements taken at time two. Consider Figure 2.8. On the left side, you see high agree¬ 
ment between measures of the same people taken at two different points in time. On the 
right side, you find low levels of agreement between the two measurements. The mea¬ 
surements on the left would be considered reliable, while those on the right would be con¬ 
sidered unreliable, at least from a temporal perspective. 


Equivalent Forms Reliability 


Remember when you took the SAT®? The SAT has been administered to millions of high 
school students over the decades since its introduction. But the same SAT items have not 



GURE 2.8 Examples of High and Low Test-Retest Reliability: Score Distributions of Individuals Tested on Two Different Occasions 




2.4 Interpretation 


m, administered to those millions of students. If that were the case, the answers to those 
would have long since been circulated among dishonest test takers. For many 
students, the test would simply be a test of the extent to which they could memorize the 
^Et answers Instead, the test developers have devised many different forms of the exam- 
Ztion that are assumed to cover the same general content, but with items unique to each 
Assume you take the test in Ames, Iowa, and another student takes a different form 
l f the test in Philadelphia. How do we know that these two forms reliably measure your 
I knowledge and abilities; that you would have gotten roughly the same score had you switched 
KLts (and tests) with the other student? Just as was the case in test-retest reliability, you 

v . r_c.i_„„,1 eoo if <TPt flip same 

Equivalent forms reliability 
Calculated by correlating 
measurements from a 
sample of individuals who 
complete two different 
forms of the same test 


can have many people iaK.e iwu umuuu --, '^ , 

score. By correlating the two test scores, you would be calculating the equivalent forms 
reliability of that test. Look at Figure 2.8 again. Simply substitute the term Form A for 
'• “Occasion 1” and “Form B” for “Occasion 2” and you will see that the left part of the 
^Ere would describe a test with high equivalent forms reliability, while the test on the 


Internal Consistency 





As you can see from the examples above, to calculate either test-retest or equivalent forms 
reliability, you would need to have two separate testing sessions (with either the same form 
or different forms). Another way of estimating the reliability of a test is to pretend that 
instead of one test, you really have two or more. A simple example would be to take a 
100-item test and break it into two 50-item tests by collecting all of the even-numbered 
items together and all of the odd-numbered items together. You could then correlate the 
total score for all even-numbered items that were answered correctly with the total score 
for all of the odd-numbered items answered correctly. If the subtest scores correlated highly, 
you would consider the test reliable from an internal consistency standpoint. If we are 
trying to measure a homogeneous attribute (e.g., memory, stress, or interpersonal skills), 
all of the items on the test should give us an equally good measure of that attribute. There 
are more sophisticated ways of estimating internal consistency reliability based on the aver¬ 
age correlation between every pair of test items. A common statistic used to estimate inter¬ 
nal consistency reliability using such averages is known as Cronbach s alpha. 


Internal consistency Form 
of reliability that assesses 
how consistently the items 
of a test measure a single 
construct affected by the 
number of items in the test 
and the correlations among 
the test items. 


Inter-Rater Reliability 

■ Often several different individuals make judgments about a person. These judgments might 
be ratings of performance of a worker made by several different supervisors, assessments 
of the same candidate by multiple interviewers, or evaluations made by several incum¬ 
bents about the relative importance of a task in a particular job. In each of these cases, 
we would expect the raters to agree regarding what they have observed. We can calculate 
i various statistical indices to show the level of agreement among the raters. These statistics 
R would be considered estimates of inter-rater reliability. 

■ As you can see from our discussion of reliability, there are different ways to calculate 
the reliability index, and each may describe a different aspect of reliability. To the extent 
that any of the reliability coefficients are less than 1.00 (the ideal coefficient denoting 
K perfect reliability), we assume there is some error in the observed score and that it is 
not a perfectly consistent measure. Nevertheless, measures are not expected to be perfectly 
1 reliable; they are simply expected to be reasonably reliable. The convention is that 
I values in the range of .70 to .80 represent reasonable reliability. Although we have con- 
I sidered each of these methods of estimating reliability separately, they all address the 
I same general issue that we covered earlier in the chapter: generalizability. The question 
ft is, to what extent can we generalize the meaning of a measure taken with one measurmg 




Chapter 2 Methods and Statistics in 1-0 Psychology 


device at one point in time? A more sophisticated approach to the question of reliability 
is based in generalizability theory (Guion, 1998), which considers all different types of 
error (e.g., test-retest, equivalent forms, and internal consistency) simultaneously, but a 
description of this technique is beyond the scope of this text. Putka and Sackett (2009) 
present an excellent conceptual and historical treatment of the evolution of modern 
reliability theory. 

Validity 


The second characteristic of good measurement is validity. Reliability dealt with whether 
or not we had consistent information on which to base decisions. Validity addresses the 
issue of whether the measurements we have taken accurately and completely represent what 
we had hoped to measure. For example, consider the job of a physician in general prac¬ 
tice. Suppose we wanted to develop a measure of the performance of general practitioners 
and that we decided to use malpractice insurance rates over the years as a measure of 
performance. We note that these rates have gone up every year for a particular physician, 
and we conclude that the physician must not be very good. If he or she were good, we 
would have expected such malpractice premiums to have gone down. 

In die physician example, the measure we have chosen to represent performance would 
be neither accurate nor complete. Malpractice rates have much less to do with a particu¬ 
lar doctor than they do with claims in general and with amounts awarded by juries in 
malpractice lawsuits. Both the number of malpractice suits and the jury awards for those 
suits have climbed steadily over the past few decades. As a result, you would note that 
malpractice premiums (like car insurance premiums) have climbed steadily every year for 
almost every physician. Further, a physician in general practice has a wide variety of duties, 
including diagnosis, treatment, follow-up, education, referral, record keeping, continuing 
education, and so forth. Even if malpractice premium rates were accurate representations 
of performance in certain areas such as diagnosis and treatment, many other areas of per¬ 
formance would have been ignored by this one measure. 

In considering reliability, we examine the extent to which we can infer that the mea¬ 
sure we have is a consistent one, one that is unlikely to change rapidly and unpredictably. 
We want to infer that what we are measuring is stable. In both reliability and validity, the 
question is whether what we have measured allows us to make predictions or decisions, 
or take actions, based on what we assume to be the content of those measures. In our 
physician example, if we were deciding whether to allow a physician to keep a medical 
license or to be added to the staff of a hospital, and we based that decision on our chosen 
“performance” measure (malpractice premiums), our decision (or inference that physi¬ 
cians with a history of increasing premiums were poor performers) would not be a valid 
decision or inference. 

You will remember that we concluded our discussion of reliability by introducing the 
concept of generalizability. What we said was that reliability was really a unitary phenomenon 
and that the various estimates of reliability (e.g., test-retest) were really just different ways 
to get at a single issue: consistency of measurement. The important concept to keep in 
mind was generalizability. The same is true of validity. Like reliability, there are several 
different ways to gather information about the accuracy and completeness of a measure. 
Also like reliability, validity is a unitary concept; you should not think that one type of 
validity tells you anything different about the completeness and accuracy of a measure than 
any other type of validity (Binning 8c Barrett, 1989; Guion, 1998; Landy, 1986,2007). Like 
reliability, validity concerns the confidence with which you can make a prediction or draw 
an inference based on the measurements you have collected. There are three common ways 
of gathering validity evidence. We will describe each of these three ways below. 


Generalizability theory A 
sophisticated approach to 
the question of reliability 
that simultaneously 
considers all types of error 
in reliability estimates (e.g.. 
test-retest equivalent 
forms, and internal 


2.4 Interpretation 83 


B Although va lidity is relevant to discussions of any measurement, most validity studies 
,i ess the issue of whether an assessment permits confident decisions about hiring or 
] nromotion Although most validity studies revolve around tests (e.g., tests of personality 
EfTcognitive ability), other assessments (e.g., interviews, application blanks, or even tests 
I £ Si c endurance) might form the basis of a validity study. For the purposes of this 
^Hu er we will use hiring and promotion as the examples of the decisions that we have 
to make. For such purposes, we have a general hypothesis that people who score higher 
or better on a particular measure will be more productive and/or satisfied employees (Lan y, 

1986) Our validity investigation will be focused on gathering information that will make 
us more confident that this hypothesis can be supported. If we are able to gather such 
confirming information, we can make decisions 

about individual applicants with confidence—our D _ 

inference about a person from a test score will be 

valid. Remember, validity is not about tests, it is | 

about decisions or inferences. 

K 1-0 psychologists usually gather validity evid- Conceptual 

ence using one of three common designs. We will Level V|HHi 

consider each of these designs in turn. All three fit jj 

into the same general framework (see Figure,.2.9). • I 

The box on the top is labeled “Job Analysis.” Job I 

' analysis is a complex and time-consuming process operational I— 

that we will describe in detail in Chapter 4. For L e Vel W — —— 

purposes of the current discussion, you simply 

need to think of job analysis as a way of identify- ^ VaUdatjon process from Conceptual and Operational Levels 

ing the important demands (e.g., tasks, duties) 
of a job and the human attributes necessary to 

carry out those demands successfully. Once the attributes (e.g., abrht'esj are identtiied, ^ 

the test that is chosen or developed to assess those abilities is calle a pre ,> or developed t 

used to forecast another variable. Similarly, when the demands of the job are dentified, ^ J* 

the definition of an individual’s performance in meeting those demands is called a cn- jdentjfjed as jr 

tenon, which is the variable that we want to predict. In Figure 2.9, you will see a hne with successful job 

an arrow connecting predictors and criteria. This line represents the hypothesis we out- ^ An t 

lined above. It is hypothesized that people who do better on the predictor will also do bet- var|able M d 

ter on the criterion-people who score higher will be better employees. We gather validity importa nl asp 

K ., • demands of tt 

» evidence to test that hypothesis. variable that \ 

Criterion-Related Validity ?a'pSS 

The most direct way to support the hypothesis (i.e., to conneri the predictorM™"* 
boxes) is to actually gather data and compute a correlation coefficient. In this desrgn, te!*- 
nically referred to as a criterion-related wdidiry design, you would “"elate feature! 
with performance measures. If the correlation was positive and statistically signfficanc (Ir1 „ 
you would now have evidence improving your confidence m the inference that p p impr oves res 

with higher test scores have higher performance. By correlating these test ^ore 5 vvitil the confide^ 

I performance data, you would be calculating what is known as a vaiKhty coefficient The Wpe^v 

test might be a test of intelligence and the performance measure might be a supemsor s ^ 

I rating. Since we mentioned a “supervisor’s rating,” something becomes —Lately 
I obvious about this design: We are using the test scores of people who are employed by Vabti^f 

the organization. This can be done in two different ways. between a ti 


Predictive Validity 

L H.C fir* method of conducting a criterion-related stndy it to test all applicants th«hire 
I applicants without using those test scores to make the hiring derision. You would then go 


Predictor The test chosen 
or developed to assess 
attributes (e.g.. abilities) 
identified as important for 
successful job performance. 

Criterion An outcome 
variable that describes 
important aspects or 
demands of the job: the 
variable that we predict 
when evaluating the validity 
of a predictor. 

Criterion-related validity 
Validity approach that is 
demonstrated by correlating 
a test score with a 
performance measure: 
improves researcher's 
confidence in the inference 
that people with higher test 
scores have higher 
performance. 

Validity coefficient 
Correlation coefficient 
between a test score 
(predictor) and a 
performance measure 
(criterion). 






Chapter 2 Methods and Statistics in 1-0 Psychology 


Predictive vaUdity design 
Criterion-related validity 
design in which there is a 
time lag between collection 
of the test data and the 
criterion data. 


Concurrent validity design 
Criterion-related validity 
design in which there is no 
time lag between gathering 
the test scores and the 
performance data. 


TABLE 2.3 Variations of Predictive Validity Designs 


TYPE OF VALIDITY 

PROCEDURE 

1. Follow up—random 

Test applicants: select randomly: collect criterion data later: correlate test scores 
and criterion data 

2. Follow up—present system 

Test applicants; select using procedures used in past collect criterion data later; 
correlate test scores and criterion data 

3. Select by test 

Test applicants: select based on test scores: collect criterion data later: correlate 
test scores and criterion data 

4. Hire, then test 

Hire applicants: test during orientation or training: collect criterion data later: 
correlate test scores and criterion data 

5. Shelf research 

Hire applicants: collect criterion data later; examine personnel folders for 
potential predictors; correlate potential predictors and criterion data 


Source: Based on Guion & Cranny (1982. pp. 239-44). 


back to the organization after some time period had passed (e.g., 6 or 9 months) and collect 
performance data. This design, where there is a time lag between the collection of the test 
data and the criterion data, is known as a predictive validity design because it enables you 
to predict what would have happened had you actually used the test scores to make the 
hiring decisions. If the test scores were related to performance scores, you might conclude 
that you should not have hired some people. Their performance was poor, as were their test 
scores. From the point at which the employer knows that the validity coefficient is positive 
and significant, test scores can be used for making future hiring decisions. The validity coef¬ 
ficient does not, by itself, tell you what score to designate as a passing score. We will deal 
with this issue in Chapter 6, where we consider the actual staffing process. The predictive 
validity design we have described above is only one of many different predictive designs 
you might employ. Some other predictive validity designs are presented in Table 2.3. 

Concurrent Validity 

In research on many diseases such as cancer and coronary heart disease, researchers carry 
out a process known as a clinical trial. The clinical trial design assigns some patients to a 
treatment group and others to a control group. The treatment group actually gets the treat¬ 
ment under study (e.g., a pill), whereas the control group does not. Instead, the control 
group gets a placebo (e.g., a pill with neutral ingredients). It is difficult to recruit patients 
for many clinical trials because they want to be in the treatment group and don’t want to 
take the chance of being assigned to a control group (although they would not typically 
know to which group they had been assigned). If the treatment is actually effective, it will 
benefit the treatment group patients, but the control group patients will not experience 
the benefits. Many employers and 1-0 researchers are like the prospective patients for the 
control group—they don’t want to wait months or even years to see if the “treatment” 
(e.g., an ability test) is effective. While they are waiting for the results, they may be hir¬ 
ing ineffective performers. 

There is a criterion-related validity design that directly addresses that concern. It is called 
the concurrent validity design. This design has no lag between gathering the test scores and 
the performance data because the test in question is administered to current employees 


2.4 Interpretation 85 


Kther than applicants, and performance measures can be collected on those employees 
Bultaneously, or concurrendy (thus the term concurrent design). Since the employees 
actually working for the organization, the assumption is made that they must be at 
But minimally effective, alleviating any concern about adding new employees who are 
^Eg minimally effective. As in the case of the predictive design, test scores are correlated 
K performance scores to yield a validity coefficient. If it is positive and significant, the 

test is then made part of the process by which new employees are hired. 

Khere is a potential disadvantage in using the concurrent design, however. We have 
B» information about those who are not employed by the organization. This has both 
^Eical and practical implications. The technical implication is that you have range restnc- 
only the scores of those who scored highly on the predictor—so the correlation 
coefficient may be artificially depressed and may not be statistically significant. There are 
statistical corrections that can offset that problem. The practical problem is that there might 
have been applicants who did less well than the employees did on the test, yet might have 
been successful performers. Since they were never hired, the employer will never know. 
Kp psychologists have conducted a good deal of research comparing concurrent and pre¬ 
dictive designs, and their general conclusion has been that, even though the concurrent 
K2n might underestimate validity coefficients, in practice this does not usually happen 
(Schmitt, Gooding, Noe, & Kirsch, 1984). One final problem with concurrent designs is 
that the test-taking motivation may not be as high for those who are already employed, 
lit is also useful to remember that both concurrent and predictive designs are only two 
variations on many different ways to assemble validity data (Guion, 1998; Landy, 1986, 
2007). We will now consider two additional methods for collecting validity data. 

Content-Related Validity 

The SIOP Principles (2003) define content-related validation strategy as “a study that demon¬ 
strates that the content of the selection procedure represents an adequate sample o 
important work behaviors and activities, and/or worker knowledge, skills, abilities or other 
EZractenstics (KSAOs) defined by the analysis of work” (SIOP, 2003). The job analysis 
in Figure 2.9 is an example of this strategy. As another example, assume you are the director 
of a temporary employment agency and want to hire applicants who can be assigned to 
word processing tasks for companies. You know that these companies typically use either 
WordPerfect or Microsoft Word and use either a Macintosh or a PC system. So you ask 
the job applicants to demonstrate their proficiency with both of these word processing 
f packages on both PCs and Macs. Since not all employers have the latest hardware or soft¬ 
ware, you also ask the applicants to perform sample word processing tasks on various 
| versions of the software and different vintages of hardware. By doing this, you have taken 
f the essence of the work for which you are hiring individuals—word processing on any of 
■ a number of hardware and software configurations—and turned it into a test. 

I There can be little argument that, at least conceptually, there is a clear link m our example 
between test scores and probable performance. Of course, you would also need to demon¬ 
strate that the test you had assembled fairly represented the types of word processing pro¬ 
jects that the temporary employees would encounter. If you were using only the word 
I processing test, you would also need to show that actual word processmg (e.g., as opposed 
| to developing financial spreadsheets with Excel) is the most important part of the work 
for which these temps are hired. If, for example, the temps were hired to answer phones 
I or manually file records, the test of word processing would be largely irrelevant. But assum¬ 
ing that the job the temps will be asked to do is word processing, you can infer that apph- 
cants who do better on your test will tend to do better at the actual word processmg tasks 
l in the jobs to which they are assigned. The validity of the inference is based not on a cor- 
K relation, but on a logical comparison of the test and the work. To return to Figure . , 


Content-related validation 
design Demonstrates that 
the content of the selection 
procedure represents an 
adequate sample of 
important work behaviors 
and activities and/or worker 
KSAOs defined by the job 
analysis. 




Chapter 2 Methods and Statistics in 1-0 Psychology 


Construct validity Validity 
approach in which 
investigators gather 
evidence to support 
decisions or inferences 
about psychological 
constructs: often begins 
with investigators 
demonstrating that a test 
designed to measure a 
particular construct 
correlates with other tests 
in the predicted manner. 


Construct Psychological 
concept or characteristic 
that a predictor is intended 
to measure: examples are 
intelligence, personality, 
and leadership. 


although the focus of the study is the association between a predictor and a criterion (in 
this case the speed and accuracy of word processing), no criterion information from the 
work setting is collected. 

The example of the word processing test was simple and straightforward. Many jobs 
are not quite as simple as that of a word processor. Consider the position of a manager 
of a cellular telephone store with 5 inside and 15 outside sales and technical representa¬ 
tives. Suppose the company were to open a companion store in the next town, and needed 
to hire a manager for that store. How could we employ a content-related design to gather 
data that would give us confidence in making the hiring decision? The job of manager 
is complex, involving many varied tasks, as well as a wide variety of knowledge, skills, 
abilities, and interpersonal attributes. Using Figure 2.9 as our model, we would analyze 
the job to determine the most important tasks or duties, as well as the abilities needed to 
perform those tasks. How would we do this? By asking experienced employees and super¬ 
visors in other cellular phone stores to give us the benefit of their observations and per¬ 
sonal experience. We would ask them to complete one or more questionnaires that 
covered tasks and their importance and necessary abilities. Based on an analysis of their 
answers, we could then identify or develop possible predictors for testing manager can¬ 
didates. We would then choose the set of predictors that measured abilities that had been 
judged to be most closely related to various performance demands for managers. 

Through the use of knowledgeable incumbent employees and supervisors, we would have 
been able to make the logical connection between the predictors and anticipated perform¬ 
ance. Although content-related validation designs for jobs can become rather complex, 
we have described the “basic” model so you can get a feel for how the content-related 
strategy differs from the criterion-related strategy. But remember, both strategies are address¬ 
ing the same basic hypothesis: People who do better on our tests will do better on the job. 

Construct-Related Validity 

Calling construct validity a “type” of validity is a historical accident and not really cor¬ 
rect (Landy, 1986). In the 1950s, a task force outlined several ways to gather validity 
evidence and labeled three of them: criterion, content, and construct (Cronbach & Meehl, 
1955). The labels have stuck. Modern 1-0 psychology, however, does not recognize that 
distinction—referred to sarcastically by Guion (1980) as the “holy trinity.” Instead, as we 
have described above, validity is considered “unitarian” rather than “trinitarian.” There 
are literally hundreds of ways of gathering evidence that will increase the confidence of 
our decisions or inferences. Criterion-related designs and content-related designs are two 
of the many available approaches (Guion, 1998; Landy, 1986). Every study could have a 
different design, even though some may be more popular than others. The same is true 
with validity designs. Every validity study could have a different design, but criterion- and 
content-related designs are among the most popular, for reasons we will describe below. 

A simple definition of construct validity is that it represents “the integration of evid¬ 
ence that bears on the interpretation or meaning of test scores—including content and 
criterion-related evidence—which are subsumed as part of construct validity” (Messick, 1995, 
p. 742). A construct can be defined as a “concept or characteristic that a predictor is intended 
to measure” (SIOP, 2003). A construct is a broad representation of a human character¬ 
istic. Intelligence, personality, and leadership are all examples of constructs. Memory, 
assertiveness, and supportive leader behavior are all examples of these broader entities. 

Examine Figure 2.10. As you can see by comparing this with Figure 2.9, we have sim¬ 
ply added the term “construct” to our generic validation model. The modified figure demon¬ 
strates that constructs are related to both abilities and job demands. Let’s consider the job 
of a financial consultant for an investment banking firm. As a result of a job analysis, we 
were able to determine that memory and reasoning were important parts of the job of a 


2.4 Interpretation 



| financial consultant because the job required the con¬ 
sultant to remember data about various stocks and c 

bonds and to use that information to develop an invest- I VH 

ment strategy for an individual client. What is one of the 
broad and essential attributes necessary to do well on both Conceptual 

a test of reasoning and memory, and to be effective in eve ;_ ___ 

F advising clients on how they should invest their money? 

It is intelligence, or cognitive ability. In this case, the con- | 

struct is intelligence and we see it as underlying both per- 
f formance on the test and performance on the job. In other 
words, doing well on the job requires the same construct IhSkbUK 

as doing well on the test. if 

f The contribution of the concept of construct valida- B I 

tion is that it encourages the investigator to cast a broad 

net in gathering evidence to support decisions or infer- operational ^ - 

ences. In a criterion-related study, there was a tight Level _1‘ . 1!' ’’“1 ! 

focus on a test score and a performance score. In 

content-related validation, there was a tight focus on a piquRE 2.10 A Model for Construct Validity 
job analysis. In our example of the financial consultant, 

construct validation would allow for evidence from studies that have been done previ¬ 
ously on the topics of intelligence, reasoning, and memory, job analysis information on 
the financial consultants in many different forms and industries; well-developed theories 
of decision making and memory; and even observations of how memory and reasoning 
are used in a broad range of occupational groups. It could also include evidence from 
l criterion- or content-related studies of the job in the firm that is considering using the 
memory and reasoning test. Arvey (1992) presents another example of a construct vali¬ 
dation design in the area of physical ability testing for police officers. That example is shown 
in Figure 2.11. In this case, the constructs are strength and endurance rather than intel¬ 
ligence. In the case of strength, the hypothesis is that strength underlies the ability to 
perform bench dips, climb walls, wrestle with a dummy, and drag a dummy in a test 
1:environment, as well as the ability to climb real walls and restrain real suspects. Similarly, 
the endurance hypothesis is that individuals who can do well on a mile run can also do 
well in pursuing suspects in a foot chase. Endurance is the construct that underlies doing 


Physical Ability 
Test Indicators 


Dummy drag 
Dummy wrestle 
Wall climb 


FIGURE 2.11 Construct 
VaUdity Model of Strength 
and Endurance Physical 
Factors 

Source: Arvey (1992). 


-Climbing walls 






Chapter 2 Methods and Statistics in 1-0 Psychology 


well both on the test and on the job. Evidence that bears on these two hypotheses could 
come from a literature review on physical performance, laboratory studies, field studies, 
accepted theories of muscular strength and aerobic endurance, and observations or inter¬ 
views of police officers (or even suspects). The key is in the integration of this evidence 
to strengthen our confidence that making hiring decisions based on strength or endurance 
measures will lead to more effective performance for police officers. 

The more evidence we can gather, the more confident we can be in our decisions and 
inferences. As you will recall, we began this module by pointing out that we seldom have 
complete information on which to base decisions. We deal in samples of behavior. As a 
result, we must eventually make a decision that the information we have is sufficiently 
reliable, accurate, and comprehensive to make the necessary decisions or draw the neces¬ 
sary inferences. Sometimes the decisions are small and simple and don’t require a great 
deal of evidence, as in the example of hiring the word processing temps. In other situ¬ 
ations, such as the development of a national recruiting and selection program for finan¬ 
cial consultants, the decisions are big and complicated; for these we need a lot of evidence. 
The scope of the decisions and inferences will dictate how much evidence we need to be 
confident in those decisions. There is no clear bright line that says “collect this much evid¬ 
ence and no more.” As a general principle, it is more effective to use several different 
designs and gather substantial evidence, regardless of what we call the evidence, than to 
depend on a single design, such as a criterion-related or content-related design. The greater 
the accumulation of evidence, the greater our confidence (Landy, 2007). Here we might 
think of the combination of the lab and field study. The lab study provides rigorous 
cause-effect analyses and the field study provides real-world relevance. 

Validity and the Law: A Mixed Blessing 


Until the 1960s, discussions about validity and the validation process were of interest only 
to a relatively small community of 1-0 psychologists and psychometricians. The Civil Rights 
Act of 1964 and the resulting development of the Equal Employment Opportunity 
Commission changed all that. By 1972 the government was requiring employers to pre¬ 
sent validity evidence as a defense against employment discrimination claims. If, for 
example, a minority or female applicant complained of failing to receive a job offer because 
of unfair discrimination, the employer was required to show that the test or hiring prac¬ 
tice in question was fair and job-related (in 1-0 terms, “valid”). Furthermore, the 
employer was told that the only convincing evidence would be in the form of a criterion- 
related study. By 1978 the government had grudgingly broadened its view a bit to permit 
content-related and construct-related evidence, but it was clear that the “gold standard” 
would still be criterion-related evidence (Landy, 1986). 

For the past 30 years, in the context of employment discrimination lawsuits, judges have 
issued opinions on the adequacy of validity models and validity evidence, and, more import¬ 
antly, on what they believe to be the necessary elements for those models. As a result, 
what had originally been three examples of designs for gathering evidence have become 
the only three acceptable models or designs, and the characteristics that constitute an “accept¬ 
able” study have become increasingly detailed. This is a good news-bad news situation. 
The good news is that the concept of validity is receiving the attention it deserves. The 
bad news is that the evolving view of validation is becoming more distant from the way 
it was originally conceived and from the way 1-0 psychologists think of the concept in 
the 21st century. Most judges are not particularly talented psychometricians (just as, we 
hasten to add, most psychometricians would not necessarily be talented judges). There is 
an increasing tension between 1-0 psychology and the courts on technical issues related 


Mi 


Ik Interpretation 


BOX 2.2 COSTCO AND GENDER DISCRIMINATION 


stco is a major “big box” retail company that has 
eral hundred stores in the United States, as 
(well as Canada and several other countries. A large 
portion of these stores are managed by males, 
ough women hold many other management 
Bd administrative functions in the stores. Female 
jployees of Costco have filed a gender discrim- 
ation suit against the company for failure to 
mote women to store manager positions in 
portion to their numbers as Costo employees. 
Costco agrees that there are fewer women store 
agers, but explains that their promotional 
bdel requires that every store manager will have 
■en a merchandising manager before becom- 
ing store manager. The merchandising manager is 
onsible for setting up the store each day before 
Ihe store opens. This includes pricing, displays, 
and other aspects of readiness for customers. Merch- 
dising managers typically work from 4:00 a.m. to 
1:00 a.m. 1-0 psychologists have confirmed the 
§>ortance of merchandising experience in store 
K management through job analysis. 


Costco human resources records indicate that, 
proportionately, women do not apply for mer¬ 
chandising manager positions as frequently as men 
do. The plaintiffs do not dispute that women avoid 
the merchandising manager position. Statisticians 
for Costco confirm that if experience as a mer¬ 
chandising manager is controlled for, women get as 
many promotions to store manager as men. The 
plaintiffs counter that (a) merchandising experience 
is not critical for the position of store manager, and 
(b) even if it is, there are other ways to get it 
besides working from 4:00 a.m. to 11:00 a.m. 
The plaintiffs allege that “gender stereotypes” are 
responsible for Costco’s failure to promote women 
to store manager positions. 

1-0 psychologists for Costco contend that the deci¬ 
sion to promote to store manager positions is 
clearly valid, job related, and fair to women and that 
there is no evidence of stereotyping in decisions to 
promote. The case will go to trial in 2010. 


to assessment and decision making (Landy, 2002a, b). The current federal guidelines, which 
are often relied upon in discrimination cases (Uniform Guidelines, 1978), are hopelessly 
out of date on issues such as validation strategies. As long as they remain an “authority” 
indiscrimination cases, there will be a serious lack of agreement between 1-0 psychology 
1 public (and employer) policy. Box 2.2 describes a lawsuit in which job analysis and 
dity play key roles. We will revisit the issue of the interface of law and 1-0 psychology 
Chapters 6 and 11. For those who find this topic interesting, SIOP has published a book 
he Practice Series that covers virtually every aspect of employment discrimination 
ption (Landy, 2005a). 





